Skip to main content

Full text of "Conference On Census Undercount. Proceedings of the 1980 Conference"

See other formats







For Reference 

Not to be taken from this room 

Conference On 
Census Undercount 


of the 

1980 Conference 

U.S. Department of Commerce BUREAU OF THE CENSUS Washington, DC. 20233 

(WW^tt, **. (W**V IW&AWv^ C^ A^L^ ; 





On Census 





Issued July 1980 

% mm § 

U.S. Department of Commerce 

Philip M.KIulznick, Secretary 
Luther H. Hodges, Jr., Undersecretary 
CourtenayM. Slater, Chief Economist 


Vincent P. Barabba, Director $3 

] 1 3 i; 


Vincent P. Barabba, Director 

Daniel B. Levine, Deputy Director 

George E. Hall, Associate Director 

for Demographic Fields 

Barbara A. Bailar, Associate Director 

for Statistical Standards and Methodology 

Meyer Zitter, Chief 

Charles D. Jones, Chief 


The Conference on Census Undercount was the result of efforts by many people. The conference 
was organized and managed by Meyer Zitter and Richard A. Engels (Population Division). Mr. Engels 
also prepared and coordinated this report. The conference minutes were developed and preliminary 
editing of the proceedings was completed by Frederick G. Bohme (Data User Services Division), 
Kennon R. Copeland (Statistical Methods Division), and Edith K. McArthur (Population Division). 
The report was prepared for publication under the supervision of C. Maureen Padgett, assisted by 
Barbara Materre (Publication Services Division). Special thanks are due Sandra K. Morris (Population 
Division) for handling the conference registration, the distribution of materials, and detailed arrange- 
ments. We also acknowledge the assistance provided by Gilbert Felton, Vicki Davis, Helen Cater, 
Sandra Miller, and Pat Lintner (Population Division). Barbara Milton (Director's Office) and 
Ronald W. Cyr (Administrative Services Division) provided administrative and other conference 
support. Conrad Taeuber (Georgetown University) directed the work of the Conference Steering 
Committee and served as the Chair. 

Library of Congress Cataloging in Publication Data 

Conference on Census Undercount, Arlington, Va., 1980. 
Conference on Census Undercount. 

"Issued July 1980." 

Sponsored by the U.S. Bureau of the Census. C 3.2:Un2/2 

1 . United States— Census— Congresses. 2. United 
States-Census, 20th, 1980-Congresses. I. United 
States. Bureau of the Census. II. Title: Census 
HA37.U55C63 1980 001.4'33 80-607998 

For s;ilc by tin' Superintendent of Documents, U.S. Government Printing Office 

Washington. !).('. 20402 


The importance of census counts for use (1) in Congressional reapportionment, (2) 
in the allocation of funds and resources to local areas, (3) to provide a base for more cur- 
rent estimates that also are used for allocating Federal resources, and (4) to serve as a 
foundation for the planning and evaluation of private and public programs has caused 
increasing interest in the accuracy of census counts and in the availability of estimates 
of coverage. This is over and above the more obvious application of census results as the 
basis for political redistricting, now often involving counts for areas as small as city 
blocks. These concerns, and the Census Bureau's continuing interest in measuring and 
reporting on the quality of the data obtained in the decennial census, led to the con- 
clusion that it was appropriate to take as broad a perspective as possible on the issue 
of census adjustment. 

Although the Bureau itself has been active in research concerning the undercount, it 
also recognized the need to obtain the views of the general research and statistical com- 
munity. This is in agreement also with the findings of a National Academy of Sciences 
panel convened in 1977 to examine the planning for the 1980 census. The panel stressed 
the importance of the counts, concluded that the figures should be adjusted, and recom- 
mended that the Bureau continue to investigate adequate technical means for measuring 
the undercount and for adjusting the counts. 

As a first step, the Bureau of the Census convened a census undercount workshop, 
September 5-8, 1979, at Reston, Va. The workshop participants included management 
and professional personnel from the Bureau of the Census, the Department of Commerce, 
and a few additional participants familiar with the undercount issue and its implications. 
The workshop was structured under the guidelines of a decisionmaking system, "Strategic 
Assumption Testing and Surfacing for Strategic Management," developed by Dr. Richard O. 
Mason of the University of Southern California, and Dr. Ian I. Mitroff of the University 
of Pittsburgh. 

The specific purpose of the workshop was to determine whether or not the discussion 
of the undercount to date had been sufficiently comprehensive to identify all of the 
issues and assumptions relative to undercount measurement and adjustment. Assumptions 
identified in the workshop were examined in terms of their relationships to other assump- 
tions and the importance and degree of certainty that should be attached to each. 

Following the undercount workshop, the Bureau sponsored the Conference on Census 
Undercount on February 25-26, 1980, in Arlington, Va. The conference was designed to 
provide a forum for considering alternative approaches to measuring the census under- 
count and to assess the implications of adjusting the census counts. 

This volume contains the conference papers and the discussion of the papers at the 
conference. In order to investigate as broad range of concerns as possible at the con- 
ference, the Bureau undertook a general solicitation of papers on undercount issues 
(see appendix A). Papers were solicited on the undercount, in general, but also on a num- 
ber of specific concerns: 

Methods of measuring the undercount for subnational areas, including the quality of 
the estimates of undercount in relation to the size and other characteristics of the area; 
and the feasibility of providing accuracy checks or confidence intervals. 

The timing of the adjustment(s) for undercount. 



Measuring and adjusting for the undercount and misreporting for factors other than 
total population, such as social and economic characteristics. 

The use of adjusted figures in Federal programs and the impact of adjustments on the 
Federal statistical system. 

Political and legal issues in making adjustments to the census counts. 

The effects of adjustments on equity in the distribution of Federal funds. 

Decision theory and theoretical aspects of adjustment. 

Under the direction of a Conference Steering Committee, 17 papers were selected to 
be presented at the conference. In addition to reviewing the papers proposed for the 
conference, the Steering Committee guided the general planning and program for the 
conference. The members of the Steering Committee were: 

Conrad Taeuber, Chair Georgetown University 

William G. Cochran Harvard University (Deceased) 

Nathan Keyfitz Harvard University 

Leslie Kish University of Michigan 

William H. Kruskal University of Chicago 

Daniel 8. Levine Bureau of the Census 

Evelyn Mann City of New York 

Harry V. Roberts University of Chicago 

Julian Samora University of Notre Dame 

Richard M. Scammon Elections Research Center 

Richard Smolka American University 

Bruce Spencer Northwestern University 

Phyllis A. Wallace Massachusetts Institute of Technology 

Eddie Williams Joint Center for Political Studies 

Meyer Zitter Bureau of the Census 

We are indebted to the Steering Committee for its work, to the authors and dis- 
cussants for their careful attention to the undercount issues, and to the conference par- 
ticipants for their observations and suggestions. 

Vincent P. Barabba 


U.S. Bureau of the Census 


Conference Overview and Summary 

Major Conference Findings . . . Conrad Taeuber 3 

The Bureau's Agenda on the Undercount Decision . . . Vincent P. Barabba 5 


Welcome and Introduction . . . Vincent P. Barabba 9 

Census Undercount: Time To Adjust . . . Robert Garcia 12 

The Census Bureau Experience and Plans . . . Jacob S. Siege/ and Charles D. 
Jones 15 

Adjustments: Pro and Con 

Facing the Fact of Census Incompleteness . . . Nathan Keyfitz 27 

Adjusting for Decennial Census Undercount: An Environmental Impact 

Statement . . . Peter K. Francese 37 

Comments . . . Robert P. Strauss 44 

Floor Discussion 47 

The Congressional Perspective . . . Daniel P. Moynihan 49 

Methodological Considerations I 

Can Regression Be Used To Estimate Local Undercount Adjustments? 

Eugene P. Ericksen 55 

Modifying Census Counts . . . /. Richard Savage 62 

Comments . . . William G. Madow 76 

Floor Discussion 78 

Methodological Considerations II 

Diverse Adjustments for Missing Data . . . Leslie Kish 83 

The Analysis of Census Undercount From a Postenumeration Survey 

A. P. Dempster and T. J. Tomber/in 88 

Some Empirical Bayes Approaches to Estimating the 1980 Census Undercount 

for Counties . . . Robert E. Fay, III 95 

Comments . . . Tommy Wright 100 

Floor Discussion 103 

Impacts of Adjusting 

The Impact of Census Undercoverage on Federal Programs . . . Courtenay M. 

Slater 107 

The Impact of the Undercount on State and Local Government Tranfers 

Herri ngton J. Bryce 112 

Comments . . .Wray Smith 125 

Floor Discussion 126 


Other Methods and Impacts 

The Synthetic Method: Its Feasibility for Deriving the Census Undercount for 

States and Local Areas . . . Robert B. Hill 129 

Comments . . . Joseph Waksberg 142 

Floor Discussion 144 

The Impact of an Adjustment to the 1980 Census on Congressional and 

Legislative Reapportionment . . . Carl P. Carlucci 145 

Comments . . . Richard Smolka 153 

The International Experience 

Adjustment for Census Underenumeration: The Australian Situation 
Brian Doyle 1 57 

Census Undercount: The International Experience . . . Meyer Zitter and 
Edith K. McArthur 164 

Floor Discussion 181 

Legal Concerns 

Legal and Constitutional Constraints on Census Undercount Adjustment 

Donald P. McCullum 1 85 

Floor Discussion 189 

Considerations of Statistical Equity 

Should the Census Count Be Adjusted for Allocation Purposes: Equity 
Considerations . . . Ivan P. Fellegi 193 

Implications of Equity and Accuracy for Undercount Adjustment: A 
Decision-Theoretic Approach . . . Bruce Spencer 204 

Comments . . . Harry V. Roberts 217 

Floor Discussion 223 

Appendix A 

Letter of Invitation for Conference Papers 227 

Request for Papers 229 

Notice Inviting Papers 230 

Appendix B 

Conference Attendance List 233 

Appendix C 

Conference Program 243 

Overview and 

Major Conference Findings 

Conrad Taeuber 

Conference Chairman 
Georgetown University 

Although it was not expected that the conference partici- 
pants would reach unanimity on the issues examined at 
the conference, some general directions can be identified 
in the discussion. They in no way represent general agree- 
ment by the participants, however, and should not be con- 
sidered to be recommendations from the conference. 

1. There would have been ready assent among the con- 
ference participants to any statement that emphasizes 
the importance of obtaining as nearly as possible a 
complete count. 

2. There appeared to be general consensus that some 
form of adjustment for the undercount is needed, 
for areas with concentrations of persons likely to 
be missed may be receiving less than their share of 
funds which are distributed in part or wholly on the 
basis of population. 

3. There was lack of agreement on the desirability of 
making adjustments to the traditional census re- 
porting for apportionment. Congressman Garcia was 
in favor of adjusted counts for all uses, while Senator 
Moynihan wanted no adjustments made for the ap- 
portionment of seats in the House. Judge McCullum 
(Alameda County Superior Court) concluded in his 
prepared remarks that only an "enumeration" would 
meet the constitutional requirement regarding the 
allocation of seats among the States. Within the 
States, and for State legislative purposes, there is no 
such limitation. The States are free to develop their 
own apportionment and redistricting procedures, 
and even to disregard the census figures or estimates, 
if they believe they have a superior set of data. The 
standard likely to be applied is that the alternative 
data must be clear, cogent, and convincing. Equity 
and equality as mandated by the one-man-one-vote 
rule would also apply, but the courts have some 
leeway in deciding what those standards require. 

Judge McCullum emphasized that, except for the 
apportionment of House seats among the States, ad- 
justed figures would be called for and that, if the 
Executive Branch did not supply such figures, the 
courts were likely to do so. In that case, the results 
might be far less acceptable than if the adjustments 
had been made by technical experts. Here Keyfitz's 
comment is applicable. He reported that he had dis- 
cussed the question of adjusted figures with a judge 
who told him that if confronted with the issue, the 

court would call on an expert. Keyfitz had responded 
that he might be considered to be such an expert and 
that he did not have the answer. 

4. There was one strong statement arguing that no ad- 
justment should be made. It was felt that the presumed 
greater accuracy of adjusted counts would not be crit- 
ical to business uses. In addition, the improvement in 
accuracy would not offset the delays involved and the 
confusion of "two sets of books." It was also argued 
that adjustment would beget adjustment. That is, 
once adjustments by age, sex, and race had been 
made, there would be demands for characteristics 
of the persons added as a result of the adjustment 

5. There appeared to be general support for the view 
that if an adjustment were to be made, it should be 
as simple as possible, something that would not only 
be viewed as valid but could be readily explained. 
This would appear to lend support to synthetic esti- 
mates rather than regression estimates or other more 
sophisticated techniques for States and other sub- 
State areas. 

6. There was some uncertainty concerning the timetable 
under which any adjustments might be made. If full 
reliance were to be placed on demographic methods 
of estimating the undercount, the results would be 
available earlier than if the results of the postenumer- 
ation survey are to be brought into the computations. 

7. Users would probably be willing to sacrifice some 
fine tuning of the estimates of the undercount if that 
would lead to a more timely release of the estimates 
and of any adjustments that might be made. 

8. There was general agreement that the decision to 
adjust or not should be made before the census 
results are available. If the decision is made to adjust, 
the announcement of that decision should be ac- 
companied by a statement of the procedure to be 
used in adjustment. 

9. There was little discussion of the form in which 
adjusted numbers should be released. There seemed 
to be agreement that, if issued, they should not re- 
place the numbers published on the basis of the 
enumeration, and should not be carried into cross 
tabulations. It was suggested that this might be accom- 
plished by issuing "adjustment factors" rather than 
adjusted counts, leaving it to individual users to make 
use of the adjustment factors as suited the needs. 

10. There are special problems involved in securing ad- 
justment factors for Hispanics and other minority 
groups. One suggestion was that the factor for blacks 
should be applied to Hispanics and all nonwhite 
groups. Some Hispanic spokesmen, however, claim 
that the undercount for their group is greater than 
that for blacks. The Bureau is on record as speculating 
that it falls somewhere between the factors for 
whites and blacks. 

1 1. The subject of illegal aliens or undocumented workers 
was discussed as a question that needs to be recog- 
nized, though there was no clear proposal by which 
they might be included in estimates of the under- 

12. It was presumed that adjustments, if any, would con- 
tribute to equity in the distribution of funds and any 
other benefits. Statistical methods for measuring the 
improvement in equity were presented. One of the 
authors repeatedly stressed that improvements in 
equity must be measured in terms of achievement 
of the legislative intent. 

13. A review of the statistical needs of Federal agencies 
led to the conclusion that the underreporting of in- 
come in the census was potentially a more difficult 
issue than the undercount of population. It is not 
clear what the impact of the incomes of the un- 
counted persons would be on distributions or on 
measures of central tendency. The effect on poverty 
measures is likewise uncertain. 

14. Some reference was made to the variety of provisions 
in the laws governing the distribution of funds from 
the Federal Government. Some laws specify the most 
recent census, others speak of estimates by the De- 
partment of Commerce, and there are a number of 
variants of these. Attention may need to be paid to 
the legal actions that may be necessary if adjusted 

numbers are to be used. Interest was expressed in 
the experience of Australia where census results are 
published as collected, but postcensal estimates of 
population take into account an adjustment for 

15. There were repeated references to the difference be- 
tween "imputations" and "adjustments." It was 
pointed out that the proposed adjustments would 
not be significantly different from the procedures 
used for 1970 when additions were made to the 
enumerated population. The postenumeration post 
office check and the vacancy check in connection 
with the 1970 census were viewed as on the thin 
edge. The imputations based on these two postenu- 
meration checks were distributed at random and the 
characteristics of the imputed individuals were de- 
rived by statistical procedures. The imputations made 
on the basis of the checks after the enumeration are 
not likely to recur in 1980 because the mail-out mail- 
back procedure is nearly universal, as is the program 
to visit each unit that is initially reported as vacant. 

16. There was a call for more and intensive research into 
the means of reducing the undercount as well as into 
appropriate methods for making adjustments. Tech- 
niques of matching offer possibilities that are only 
partially realized due to the primitive state of the 
methodology. It was suggested that far too little use 
has been made of the opportunities created through 
the availability of administrative records. 

17. There was a plea that the data from any postenumer- 
ation analysis be made available promptly to research 
workers outside the Bureau of the Census for inde- 
pendent analyses. 

18. Attention was called to the likelihood that the .under- 
count would lead to a dilution of the strength of 
liberal and big city representatives in the House. 

The Bureau's Agenda on the 
Undercount Decision 

Vincent P. Barabba 


Bureau of the Census 

How do we plan to use the comments from this confer- 
ence in our decision process? 

First, the process began with the publication of the 1978 
report of the Panel on Decennial Census Plans of the National 
Academy of Sciences indicating the importance of census 
undercounts. Second, the Census Bureau followed this 
activity with a workshop in September 1979, which at- 
tempted to surface the critical assumptions related to the 
various adjustment procedures discussed by the Panel. And 
third, we have this conference, which has attempted to 
bring together the different perspectives of various re- 
searchers and interest groups. We have heard discussions on 
statistical methods. We have heard economic cost/benefit 
analysis. Social and political considerations have been brought 
up. And we have heard emphasis on the need to educate the 
public and all of its subsegments, including the Congress. 

We have not attempted at this conference to relate the 
issues of these various perspectives to each other. We intend, 
however, to bring together this synthesis of issues, and here 
is how we plan to do it. 

1. Within the next 2 months we will assimilate and report 
on this conference and obtain further comments on 
the views that were presented. 

2. Based on the conference and other inputs, a series of 
working papers will be prepared for general distribu- 

tion to highlight and clarify the major elements of 
interest and concern on the adjustment issue. 

3. In September 1980, we will conduct a workshop to 
synthesize findings and discuss possible recommenda- 
tions. At this workshop, we will deal with the inte- 
grating issues, some of which are as follows: 

• Do the benefits of adjusting for undercount out- 
weigh the cost? 

• How do various interest groups perceive the benefits? 

• What will the law allow? 

• What will the political system allow? 

• Can we combine the greatest benefits of some 
techniques with the greatest benefits of others? In 
essence, can we develop a win/win situation? 

4. Following the September workshop, we will officially 
publish a document of our findings, making explicit 
the critical assumptions that will underlie our final 

5. In a November/December 1980 time frame, we will 
develop our final decision on whether— and if yes— 
when and how to adjust for undercount, based on all 
available information, including any preliminary assess- 
ments as to probable undercount rates for 1980. The 
information obtained through this conference will 
have had a very definite role in shaping the decision. 


Welcome and Introduction 

Vincent P. Barabba 

Bureau of the Census 

I am very pleased to welcome all of you to this confer- 
ence. I also wish to express appreciation in advance to all of 
those who helped prepare for this event, including the par- 
ticipants who have prepared papers which will give us much 
to discuss in the next 2 days. 

Less than 5 weeks from today, a census questionnaire will 
be delivered to virtually every household in the United States. 
After that, our success will depend to a great extent on the 
American people, and the thousands of individuals, cities 
and towns, and organizations that are making extraordinary 
efforts to encourage cooperation and to achieve complete 

Nonetheless, I doubt that we would be here today if we 
were sure that we would achieve 100-percent success. Some 
degree of census undercount is a sure companion to a free 
and mobile society. So we will be focusing on what to do 
about census undercount, and whether, when, and how to 
adjust for it. 

The next point is that I doubt if we would be here if it 
were February of 1970 instead of 1980. Before the last 
census, concern about undercount was largely a quiet aca- 
demic exercise. In fact, one of our two distinguished con- 
gressional guests on the program today, Senator Moynihan, 
organized a conference 13 years ago on very similar topics, 
and the level of interest was considerably lower than now. 

The world has changed very much in a single decade. 
There were, of course, complaints about undercount in 1970 
from many communities after the census. But it was really 
the dramatic social changes from the midsixties to the 
midseventies that focused national attention on the benefits 
of a population census. 

There were two major developments in particular. First 
there was the civil rights movement, together with landmark 
court cases concerning apportionment and redistricting. As 
members of minority groups gained access to the political 
process, the use of census data to determine equitable 
representation at all levels of government became an instru- 
ment of progress, and concern for the adequacy of the 
counts increased sharply. 

The other major development was the host of statutes in 
this period that moved Federal resources back to State and 
local governments to address some of the problems of edu- 
cation, housing, and other social needs. It is now estimated 
that some $50 billion is distributed annually by law directly 
or indirectly on the basis of census data. These two devel- 
opments have placed the question of equity squarely before 
us, and a key issue is whether, and how, adjustments for 

missing data would produce more equity in the allocation 
of shares of resources. 

Before I get into the issues, let me simply point out that 
the most important issue is for the Census Bureau to make 
every effort to achieve the most complete count possible. 
The Bureau has taken a number of major steps in this census 
to achieve that goal, and we are spending upwards of $200 
million on better coverage. These are either new steps or 
vastly improved procedures over the last census. Among 
others, they include the following elements: We have de- 
veloped lists of addresses all across the country. We will 
have several on-the-ground checks of all addresses in cooper- 
ation with the Postal Service before and during the census. 
We will have new procedures to improve coverage within the 
housing units. We have a new program where we have invited 
the highest elected officials in every city and county to re- 
view the preliminary counts and provide feedback to the 

Just as important as these procedural steps, we are work- 
ing as never before to convince people to cooperate with 
the census. This outreach effort includes widespread adver- 
tising and public information programs with the media, 
private industry, public utilities, and other institutions. It 
also includes a grass roots program where more than 200 
full-time Census Bureau workers are explaining the census 
to minority populations in their communities and encourag- 
ing local organizations to do the same. 

And finally, we have for 5 years been working with 
Census Advisory Committees that represent the black, 
Hispanic, and Asian and Pacific Island populations to achieve 
the best possible count. 

I am, at the present time, in my fourth career as a person 
deeply involved in the development and use of statistics. Let 
me identify briefly how my experiences in the careers have 
affected the way that I approach the issue of faulty and 
missing data. 

In my first career, I provided information to decision- 
makers in political campaigns. Relative to faulty or missing 
data, we seldom had enough time, because of deadlines- 
election days are never postponed— and very limited re- 
sources, to really deal with the problem in a meaningful way. 

My second career was at the Census Bureau at a time 
when the evaluation of the 1970 census was being com- 
pleted and when the 1980 census was being planned. Given 
the interest, and the availability of resources, we were able 
to devote the time and effort necessary to achieve consider- 
able success in identifying missing and faulty data. As a 


result, the strengths and the weaknesses of the 1970 census 
data were made known to many, and I believe that in doing 
so, we greatly improved their utility to the users. 

My third career was in a large private corporation where 
we faced marketing campaigns that involved time pressures 
similar to those I experienced in political campaigns. How- 
ever, because of our size, the costs associated with our de- 
cisions were of such magnitude that improving the data was 
absolutely essential. We simply had to find the time. The 
resources, because of our size, were available. 

In one of our market surveys, one focal point for analysis 
was a unit of measurement we labeled "the establishment." 
We found it necessary to ascign characteristics for unre- 
ported data— in this case, missing data items within the 
establishment— in other to prodice a sound analysis. It 
turned out that involving eventual users of the data in 
handling these imputations produced some surprising and 
pleasant results. 

First, we held discussions to establish which variables— 
or data elements— would require imputation. The very fact 
that we held these discussions significantly reduced the 
number of imputed variables and also focused the users' 
thinking about the quality of the data items. This user con- 
tact helped us establish realistic expectations concerning 
the data well in advance of the results. The frankness of 
the discussions also reduced the extent to which the user 
community perceived the process of adjusting the data to 
be "juggling" the figures, or "massaging" or "fudging" the 

Another advantage to holding the discussions was that we 
were able to agree on the criteria to use for making imputa- 
tions, bringing to bear legitimate opinions and settling dis- 
putes among users prior to the imputation process. Further- 
more, the discussions highlighted which pieces of data were 
really important from a user point of view. The overall result 
of the discussions was an improvement of the imputation 
process and at the same time, a reduction in anxiety on the 
part of the users. 

Finally, using standard operating procedures developed 
at the Census Bureau, we gave the users the percent of im- 
puted data for each item, which showed them that nothing 
was being hidden and also gave them greater insight for use 
in interpreting and using the results of the survey. 

I offer this experience from my third career because it 
relates to one of the major issues I now face in my fourth 
career as we attempt to conduct the 1980 census. This time, 
I have the responsibility of directing the implementation of 
the census plan that was begun during my second career, 
and one of the major concerns in doing this job is dealing 
with any missing or faulty data. There are a number of as- 
pects when it comes to missing or faulty data in a census, 
but I'd like to concentrate on the aspect of most concern 
to this conference, and that is the problem of undercount, 
or more specifically, accounting for persons missing from 
the count. 

In tackling this issue before we are actually faced with 
the new numbers next year, I hope to follow the lead from 
my experience both in the public sector and private in- 
dustry—that is, to get input from involved parties prior to 
any decision. This meeting is an example. In the end, we 
hope to openly distribute a plan for making the ultimate 
decision and a plan for the implementation of adjustment, 
if the decision is to adjust, prior to any actual decision. 

It is my impression that too often we either do not learn 
from or use the results of the various commissions or con- 
ferences within which we participate. Because of this con- 
cern, the Bureau has made a conscientious effort to thor- 
oughly investigate the previous activities that have been 
directed toward this problem. This investigation has pro- 
vided us with significant insight and direction. 

However, in this investigation, we have found wanting, 
in some instances, a challenging type of review for some of 
the assumptions which underlie the many conclusions that 
have been brought forward on this issue. For example, in the 
report of the Panel of the National Academy of Sciences, 
the Panel concluded that some kind of adjustment for 
undercount is feasible and that the technical responsibility 
for procedures should rest with the Bureau of the Census. 
The Panel also recommended that we state publicly, before 
the census date, the general methods we would follow if 
adjustments are to be made. (I assume, of course, it will 
not come as a surprise to anyone in this audience that I 
do not have that statement ready today.) 

There are at least two assumptions that underlie the 
conclusions of the Panel that I would like to use in defining 
what I mean by undergoing a challenging review. First, the 
idea of stating a convention in advance, of course, has 
obvious attractions. Ideally, if everyone agreed in advance 
how the numbers would be adjusted before seeing them, 
then there would be less reason to contest the results. That 
assumes, of course, that whether you are a "winner" or a 
"loser" (as the result of any adjustment process, the "ac- 
curacy" of which can be legitimately questioned), you will 
accept (without argument) the results of a predetermined 
adjustment process. 

Second, the Panel also observed that making adjustments 
for missed people could be seen in principle as an extension 
of other techniques the Bureau has previously applied to 
correct deficiencies both in the counts and reported charac- 
teristics. However, the assumption of precedent having been 
set has been debated at length within the Bureau, and it is 
not at all clear that all of the 1970 procedures are simple 
extensions of the same principle. Some corrections, for ex- 
ample, were simply the result of correcting faulty geography, 
replacing lost materials, or using convincing second-hand 
evidence of people existing in a specified place— even though 
we did not interview them directly. 

The nearest thing to an adjustment procedure of the un- 
counted was the national vacancy check, in which classes of 
people were added to the actual counts on the basis of re- 


visits. But this applied only to a sample of housing units 
originally classified as vacant. One school of thought is that 
even this limited procedure, which was not designed in 
advance but developed during the enumeration to correct 
defects, went beyond the narrow technical meaning of a 
census enumeration. 

Indeed, the Census Bureau and the Congress were suffi- 
ciently concerned about adding people in 1970 based on a 
sample check of vacant units, to develop 1980 plans costing 
many millions of dollars to go back, in 1980, to all units 
originally classified as vacant. Thus, people would be added 
based on complete field checks rather than, as in 1970, on 
the basis of a probability that a proportion of vacant units 
were really occupied. 

Again, I use these two assumptions not because I disagree 
with the conclusion of the National Academy Panel that it is 
feasible to adjust. I mention them because I want to build 
on the work of that distinguished Panel. We want to accept 
that which stands the test of challenge from multiple per- 
spectives. We will then either reject or modify that which 
fails to stand on its own. Of course, I do not mean a statis- 
tical test. I mean the test of acceptability by those who will 
be affected by the decision. 

Let me continue developing an operational definition of a 
challenging test by using the same two assumptions. Re- 
member now, the two assumptions are: First, those to be 
affected by an adjustment of the enumeration will agree on 
an adjustment procedure prior to an assessment of whether 
they will be impacted positively or negatively, and second, 
the Bureau's actions of the past will be acceptable precedent 
to justify current actions. 

If you accept those assumptions, it seems to me that you 
will also accept the following: A very senior congressional 
delegation from State A will accept without contest the 
apportionment of the 435th member of the House of Rep- 
resentatives to State B rather than their own State on the 

basis of fewer than 300 people. They will do so even when 
the final count for the apportionment was adjusted by the 
Census Bureau, using a projection from a revisit to a national 
sample of vacant dwelling units so that State B had 300 
more people added to its "count" than did State A. 

I bring this up not to cast aspersions on previous adjust- 
ments, nor do I even assume that the scenario I just described 
could not be accepted. I could, of course, raise similar 
challenges to the assumptions required before the conclusion 
"not to adjust" could be accepted. The central point is that all 
of the critical assumptions must be tested before the Bureau 
makes its recommendation. 

I say this at the beginning of this conference because the 
reason you are here is because you, as a group, have been 
selected for your specific knowledge and expertise and the 
many backgrounds and interests that you represent. We want 
to hear not only your ideas, but how you react to the ideas 
of others and how they react to yours. We will listen and we 
will listen carefully, because this conference is a very im- 
portant element of the process by which we will ultimately 
make our recommendation. 

Whatever the final decisions may be, our overall objective 
is to ensure that they are will-informed, well-understood, 
and open. Because the decisions are of such basic impor- 
tance, and the issues are complex, we are going to take a 
bit more time getting to them than some would prefer. 
For that reason, I ask for your patience as well as your 
continuing attention to the process we go through to get 

I am strongly committed to keeping everyone informed 
on the process. I am also strongly committed to do what is 
right by the American people. I hope that by keeping our 
thought process open to you, you will accept the honesty 
and sincerity of our effort— even if you don't like our 
eventual conclusion. After all, I wouldn't want my fifth 
career to start prematurely. 

Census Undercount: Time To Adjust 

Robert Garcia 

U.S. House of Representatives 

My interest in the census undercount preceded my tenure 
as the Chairman of the House Census Subcommittee. As a 
State senator and the minority leader of the New York State 
Senate, I know what redistricting is all about. I became dis- 
tinctly aware of the difficulties caused by the census under- 
count. Even before then, I had been aware of the inequities 
resulting from the fact that I came from an area with a large 
number of people who are never recorded in the census. This 
special perspective was a main reason why I sought a place on 
the House Census Subcommittee and plan to remain as its 
chairman for as long as the people of the Bronx continue to 
send me back to Congress. 

I represent a district with a large black and Puerto Rican 
population— the very kind of people who the Census Bureau 
says are the most frequently missed. 

Since becoming chairman of the House Census Subcom- 
mittee, I have been very much impressed to learn about the 
research the Bureau of the Census has conducted regarding 
census errors. But, at the beginning of this conference, let us 
recognize that disputes about the accuracy of census results 
occurred long before the research began. 

In fact, almost every census since 1790 involved an under- 
count controversy. These were heated congressional debates 
on the count during the 19th century. For example, census 
results were challenged by States during the 1830's, the 
1850's, and 1860's. In 1840, the American Statistical Associ- 
ation issued a report criticizing the accuracy of census results. 
After the census of 1870, Nebraska decided to elect an extra 
Member to Congress because they thought that the State had 
been undercounted. Congress considered their claim on its 
merits, but finally declined to seat the extra Member. On 
three previous occasions. Congress decided to allow States 
an extra seat because of errors in the census. In 1890, New 
York City demanded a recount. In 1920, with each side 
claiming that they had been undercounted, the Congress was 
unable to decide upon a bill to implement the reappor- 
tionment. Disputes about census accuracy are certainly 
not new. 

In two respects, we face a totally different situation today. 
First, after the problems created by the census of 1920, the 
Congress enacted legislation giving the Census Bureau much 
greater latitude in conducting the census than it ever had 
before. The permanent Census Bureau, which was estab- 
lished in 1902, was now encouraged to increase the pro- 
fessional quality of its staff, which assumed a greater role in 
deciding upon the subjects covered, the rates of pay, and 
the methods of census enumeration. We are very fortunate 

that this occurred at a time of great advances in the sciences 
of statistics and demography. The application of sampling 
to the work of the Census Bureau, and the development of a 
more refined notion of the idea of census and survey error 
has laid the groundwork for the adjustments you will be 
discussing over the next 2 days. 

Because there has been some talk that the undercount 
problem was in part created by the increased awareness 
resulting from these studies, I want you to know that it is 
my view that these studies (especially the work of Jay 
Siegel) have not decreased but rather increased the credi- 
bility of census results. Before the studies were conducted, 
the debates about the undercount were cast in vague terms 
which made the issue intractable. Now, we can base our 
consideration of the issue upon solid scientific work— work 
which is notable for its self-conscious attention to its own 
limitations. This kind of careful and professional approach 
has increased the confidence the Congress has in the Census 
Bureau. I am here to urge you to continue in that tradition. 

The second respect in which our situation differs from 
that facing the censuses before 1930 is that because of all 
the advances that have been made since then, we have come 
to rely upon census results in our policy decisions, for plan- 
ning, and, most importantly, in the distribution of Federal 
benefits. Even where Federal grants are discretionary, popu- 
lation and characteristics information drawn from the census 
and the current estimates based on the census are an impor- 
tant consideration in the decisionmaking process. 

This increased use of census results has heightened our 
concern that the procedures used by the Bureau of the 
Census should not only aim to achieve the greatest amount 
of overall accuracy— they should go beyond this to ensure 
that these efforts result in an enumeration that is also 
equitable. Equity is only achieved when the resources and 
skills of the professionals at the Census Bureau are used in 
such a way as to be sure that we do not overlook the im- 
balances in the undercount. 

Frankly, if the 2.5-percent undercount which the Bureau 
estimates occurred in 1970 were evenly distributed among 
all the places in the Nation and among all the different 
groups of people— rich and poor, old and young, men and 
women, black, white, and Hispanic— and if this undercount 
were not so severely concentrated among the very people 
Congress intended to aid the most, it would be of much 
less concern. But the fact is that in 1970 the Bureau reports 
it counted 97.5 percent of the population, but only 92.3 
percent of the blacks, and only 81.5 percent of black men 



aged 25 to 34 years. Equity demands that we address our- 
selves to this disparity. 

I cannot let this point pass without acknowledging the 
improvements that the Bureau has introduced into census- 
taking procedures. Many of these improvements are aimed 
at reducing the undercount of minorities. Yet there are 
limits to the improvements that these efforts can achieve. 
During the last year, our subcommittee has heard from many 
technically trained witnesses. One of the most impressive was 
Professor Philip Hauser. Professor Hauser— as I am sure you 
all know— was associated with the Census Bureau during 
every census since 1930. 

He was largely responsible, during his tenure as deputy 
director and director, for evaluation procedures as an integral 
part of the decennial census. Coming before us in his home- 
town of Chicago, Professor Hauser urged the Census Bureau 
to use all of the information they have to adjust the census 
figures and redress the imbalance created by the differential 
undercount. I am extremely impressed by his logic. 

Prior to the census of 1930, the Congress used to act as 
the final arbiter of census accuracy. In 1929, it passed the 
law assigning the Bureau the main responsibility for designing 
census procedures and deciding upon the topics to be 
covered in the census. By this act, the Congress made the 
Census Bureau responsible for using its resources to ensure 
that the census results reflected all the information they 
had about the size and characteristics of the population. If 
evaluation studies show that there are undercounts that fall 
unevenly on the population, the Census Bureau has a duty 
to devise procedures— using the best professional talent avail- 
able—to make the most use of this information. This con- 
gressional power is delegated to the Census Bureau with a 
mandate to use the most reliable procedures available. Under 
these circumstances, I am convinced that adjusting the census 
is appropriate. Exactly how this is done is a matter for the 
experts. That is why you have been called to this conference. 

Including information in the census that was not directly 
obtained from the residents of a household is certainly not a 
new innovation. Processing errors have been corrected in this 
way for several decades. During the 1970 census, more than a 
million people were added to the census counts as a result of 
the vacancy recheck program. This adjustment— that is the 
only way to refer to it— was based on a sample survey of 
vacant units. Factors were derived from this survey, and 
whole households (together with the people living in them 
and their characteristics inferred from the survey) were added 
to the census counts. These data were used in the apportion- 
ment. Furthermore, other adjustments were made in 1970. 

The Bureau exercised its judgment and adjusted the 
census figures to correct for the fact that unprecedented 
numbers of Americans were overseas due to our involvement 
in the Vietnam war. According to Census Bureau testimony 
presented before our subcommittee in 1976, this correction 
had the effect of awarding an extra seat to Oklahoma at the 
expense of Connecticut. The decision to include overseas 

citizens of the United States in the count of the States where 
they had ties was unprecedented in the annals of U.S. census 
taking. It involved complicated statistical procedures. Ac- 
cording to the Bureau analysis of the procedure, "the data 
on 'home State' of those overseas are of an unknown reli- 
ability." The Census Bureau has announced that this pro- 
cedure will not be used in 1980. The adjustments being 
discussed for the 1980 census would be much more reliable 
than this adjustment, which had an impact on the reappor- 

Adjustment based on a more comprehensive methodology 
would be no different logically from the 1970 imputations. 
The distinction would be that in this case more compre- 
hensive information would be used. The papers you will 
consider in the next 2 days illustrate that adjustment can be 
accomplished through several procedures and the impact 
of each will be different. Consequently, it is important to 
arrive at a consensus as to the most equitable and accurate 

In spite of the best efforts of the researchers working at 
the Bureau of the Census and the best advice you can give 
them, it seems that the procedures available for correcting 
the census count are all subject to limitations that arise 
from the assumptions that must be made before they can be 
implemented. The choice of a procedure depends upon the 
kinds of assumptions the adjuster is willing to accept. The 
results of the correction may differ in important ways, 
depending upon the procedure that is used. It is not possible 
to know the direction or amount of these differences until 
the census is completed and the corrections are made. Never- 
theless, I believe we ought to agree to use the data. For 
example, several methods that might be used to adjust the 
results of the 1980 census suffer from the problem that 
they rely upon the assumption that the errors of coverage 
present are statistically independent of each other. In other 
words, to accept the results of each of these analyses, we 
must assume that it is not likely that persons missed in 
one method (the census) would be missed in another (e.g., 
a postenumeration survey). In fact, this assumption is di- 
rectly counter to the trends that have been found. Persons 
missed in the census are more likely to be missed in a post- 
enumeration survey, to be left out of administrative records, 
and to be excluded from vital statistics. 

I want to assure you that the Congress will exercise very 
careful oversight of the Census Bureau's decision to adjust 
so as to be certain that this decision results in the most 
equitable procedure available. I trust that, as usual, we will 
be kept informed of the likely alternatives and decisions 
that are made so that we will have adequate opportunity to 

Because all of these issues raise difficult problems, I was 
very pleased to review the list of speakers at this conference. 
For example, you have invited leaders of census taking in 
Canada and Australia, which have both used census adjust- 
ment procedures. Several of the papers you will hear will 


present the results of technical work that can be used in 
preparing the most appropriate adjustments. On behalf of 
the House of Representatives, I would like to express our 
appreciation for all of the work that has gone into preparing 
these studies. We recognize that there may be no single, 
simple, and completely adequate solution to the problem 
you will consider. But I would like to urge you to arrive at 
a consensus as to the best that can be done and to implement 
that alternative for the benefit of the Nation. I, for one, 
look forward with great anticipation to the results of this 

There is one more point that I would like to address to 
the employees of the Census Bureau, since I have so many of 
them here together in the same room. I have always appreci- 
ated and supported your efforts at improving the census 
procedures. These are extensive efforts, but I fear their im- 
pact will be limited. No amount of resources will result in a 
complete enumeration. I have been very much impressed by 

the arguments of Professor Nathan Keyfitz, who I see is on 
the program this morning, that the decision as to what 
procedure should be followed should be announced before 
census day. I believe that if a census adjustment convention 
is announced in advance (even at this late date), it will em- 
phasize the professional and technical grounds for the 
decision and relieve some of the distrust that might otherwise 
occur. It will also promote the kind of open governmental 
procedure our Nation needs. 

During the last several months, the members of the sub- 
committee have worked to build confidence in the census, 
while we also work to make you aware of procedures that 
need to be improved. Now, here on the eve of the enumera- 
tion, I want to assure you that during a period which is 
bound to be difficult, I will work closely with you and your 
staff to help ensure that we have the most accurate and 
complete census possible. Thank you very much for asking 
me to be here. 

The Census Bureau 
Experience and Plans 

Jacob S. Siegel and Charles D. Jones 

Bureau of the Census 

Since the previous decennial census was taken, in 1970, 
the legally mandated uses of census data have proliferated to 
the point where the distribution of many billions of dollars 
depends on the outcome of the census. Consequently, the 
Bureau of the Census has been subject to a great deal of 
public and political pressure to produce an accurate count of 
the population and to adjust the counts for any inaccuracies 
remaining in the data. In designing the 1980 census, the 
Bureau has put substantial effort into improvingthe census- 
taking procedures over those used in the 1970 census. How- 
ever, recognizing that no matter how carefully a census is 
planned and executed people will still be missed, the Census 
Bureau has been planning an extensive program to evaluate 
the coverage of the 1980 census. This document includes a 
summary of previous evaluation programs and their results, 
a description of the various techniques currently planned for 
use in measuring the coverage of the 1980 census, the plans 
for combining the various estimates, as well as a discussion 
of the effects of census errors on fund allocations. 


Beginning with the 1950 census, the Census Bureau has 
conducted systematic programs to evaluate the coverage of 
decennial censuses. The 1950 census evaluation program 
included the first large scale postenumeration survey (PES). 
This survey found an overall omission rate of 1.4 percent. 
However, subsequent analysis of these results showed that 
the PES seriously understated the amount of undercoverage. 
In fact, demographic analysis carried out later indicated that 
the probable undercount in 1950 was 3.3 percent. 

For the 1960 census, demographic analysis was used to 
produce official preferred estimates of coverage. These 
studies indicated that 5.1 million persons, or 2.7 percent, of 
the population were omitted from the census count. Of 
these omissions, 3.2 million were whites (corresponding to an 
omission rate of 2.0 percent) and 1 .8 million were black and 
other races (or 8.1 percent). The 1960 evaluation program 
included a number of other studies. A PES was again con- 
ducted, but unresolved problems relating to matching and 
correlation bias prevented the Bureau from obtaining useful 
results. A record check study was also conducted that 
matched sampled records from four sources— the 1950 
census, birth records, the 1950 PES, and alien address 
records— with the 1960 census. A range of estimates was 
produced, but the one viewed as most reasonable showed 

gross omissions of 4.0 percent, corresponding to a net 
underenumeration of 2.9 percent. The record check study 
showed a geographic pattern of error rates consistent with 
the 1950 PES. Undercount rates were highest in the South, 
followed by the West, with lower rates in the Northeast and 
North Central States. The 1960 evaluation program also 
included an evaluation of the coverage of housing units. 

The 1970 census evaluation program included a wide 
range of studies. Demographic analysis was again the source 
for preferred national estimates of undercount. According 
to this analysis, the number of persons missed increased 
slightly to 5.3 million, but the percentage undercount dropped 
to 2.5 percent. There was again a marked difference in the 
undercount rate for whites (1.9 percent) and for blacks 
(7.7 percent), as well as for males (3.3 percent) and females 
(1.8 percent). If the races and the sexes are taken together, 
even greater differences were apparent. The omission rate 
for black males was 9.9 percent, but only 2.4 percent for 
white males; 5.5 percent of black females were missed, but 
only 1.4 percent of white females. Omission rates were 
especially high— 18 percent— among black males aged 25 
to 44 years. 

Because of the difficulties with the 1950 and 1960 post- 
enumeration surveys, no such large-scale survey was planned 
for 1970. However, a number of other studies were con- 
ducted, including a CPS-census match, a Medicare-census 
match, and demographic analysis applied to States. All of 
these provided some information on the geographic distri- 
bution of the undercount. Again, various biases appeared to 
affect the results of the CPS-census match study. The esti- 
mated undercount rate was 2.3 percent overall, or 1.8 
percent for whites and 6.3 percent for blacks. Coverage 
appeared to be worst in the South followed by the North- 
east, West, and North Central regions, in that order. The 
1970 CPS-census match study showed further that coverage 
appeared to be better in urban areas than rural and better 
in metropolitan than nonmetropolitan areas. 

Information on socioeconomic differentials in coverage 
was also provided by the 1970 CPS-census match study. 
Consistent with the results of the 1950 PES, omission rates 
for lower income families were almost twice those of higher 
income families; this pattern was particularly true for whites, 
but the differences were not very great for blacks. Higher 

Prepared with the assistance of Jeffery S. Passel 
and Charles D. Cowan. 



omission rates overall were found for unemployed workers 
than employed workers, but the reverse was true for blacks. 
For both races, omission rates were higher for persons with 
lower education. 

The 1970 CPS-census match study also produced valuable 
information relating coverage of population and households 
that was used in planning for 1980. Only about half of all 
missed persons were omitted because their housing unit was 
missed. However, nearly three-fourths of blacks omitted were 
in enumerated housing units. That is, only a small proportion 
of blacks was missed because their housing unit was missed. 
The omissions of persons in covered households were largely 
caused by a combination of such factors as public apathy or 
indifference, carelessness or confusion in filling out theforms, 
and deliberate concealment, but we cannot quantify the 
relative contribution of these factors. 

The 1970 evaluation program also included a match of 
Medicare records with the census [7] . This study showed a 
gross omission rate in the census of 4.7 percent for persons 
65 years of age or over. 

Demographic analysis was applied, on an experimental 
basis, to the problem of estimating coverage for the total 
population of States in 1970 for the first time [10]. The 
1970 evaluation program also included studies of coverage of 
the Hispanic population [11] and of the implications of 
undercount for political representation and allocation of 
funds, in general, and the general revenue sharing program, 
in particular [9] . 


The Census Bureau's current plan for evaluating the 
coverage of the 1980 census at various geographic levels 
relies on three basic approaches. Demographic analysis will 
be used to produce official estimates of census coverage at 
the national level. The method of demographic analysis 
will also be used to prepare State estimates, but the quality 
of the resulting estimates is uncertain. A large-scale sample 
survey, if successful, would provide the basis for coverage 
estimates for States and major SMSA's and cities. Persons 
interviewed in the survey would be matched on a case-by- 
case basis to the census and possibly to various administra- 
tive record files to produce the undercount estimates. On the 
basis of these projects, the Census Bureau is planning to issue 
official estimates of census coverage for the Nation and 
States, as well as the largest SMSA's and cities. 

The Census Bureau is currently investigating methodology 
that would permit estimation of census coverage for smaller 
geographic areas. These coverage estimates would be based 
on statistical techniques, such as regression analysis or syn- 
thetic estimation, and would employ data from the sample 
survey conducted following the census, the census itself, 
and possibly other sources. According to current plans, the 
coverage estimates for local areas would be experimental in 

nature. However, should the Bureau receive a clear mandate 
or be directed to produce local area coverage estimates, the 
same techniques will probably be employed. 

Several key components of these plans for producing 
coverage estimates are currently being investigated and 
reviewed with regard to their feasibility and the validity of 
the results produced. Thus, the plans presented here should 
be viewed as our most optimistic plans, i.e., those to be 
implemented under the most favorable circumstances. Un- 
fortunately, some of these procedures may fail to produce 
the required data. For example, we at the Census Bureau, 
as well as many persons working in this field, are concerned 
about the feasibility of accurately matching two sets of 
records. Since record matching is an essential part of a num- 
ber of the studies to be discussed, this problem and its 
resolution will have a major impact on the final form and 
scope of the studies to be conducted following the 1980 

The overall plans and objectives for evaluating the cover- 
age of the 1980 census, within the limitations noted above, 
can be summarized as follows. (It should be noted again 
that the dates cited for results of match studies are dependent 
on whether satisfactory matching can be carried out.) 

1. Preliminary national estimates of coverage 

a. Total population as estimated by demographic 
analysis by January 1, 1981. 

b. Age, sex, and race (white, black and other races) 
estimates derived by demographic analysis by 
mid-1981 and from the match studies by mid-1982. 

c. Estimates for the Hispanic population from the 
match studies by mid-1982. 

2. Revised national estimates of coverage for age, sex, 
and race/origin groups 

a. "Demographic" estimates for age, sex, and race 
groups by mid-1982. 

b. Combined estimates from demographic analysis and 
the match studies by late 1983. 

c. Estimates from the match studies (primarily) for 
Hispanic, American Indian, and Asian- and Pacific- 
American populations by late 1983. 

3. Estimates of coverage for States 

a. Preliminary estimates from the match studies by 

b. "Demographic" estimates by late 1983. 

c. Combined estimates from demographic analysis and 
the match studies by late 1983. 

4. Estimates of coverage for local areas 

a. Preliminary estimates for major cities and SMSA's 
from the match studies by mid-1982. 


b. Experimentation on estimates for local areas based 
on regression and synthetic techniques. 

c. Experimental estimates combining demographic 
estimates and match study results probably with 
synthetic-regression techniques in 1984. 

National Estimates 
Demographic Analysis 

Demographic analysis as a tool for census evaluation in- 
volves developing expected values for the population in 
various categories (such as age, sex, race categories) at the 
census date by the combination and manipulation of various 
types of demographic data, then comparing these values with 
the corresponding census counts. The demographic data are 
drawn from sources essentially independent of the census, 
such as birth, death, and immigration statistics, historical 
series of census data, and data from sample surveys. The 
accuracy of the method obviously depends on the quality 
and logical consistency of the demographic data. 

Demographic analysis will provide a national estimate of 
net underenumeration for the total population and national 
estimates of net census error, combining coverage and classi- 
fication errors for age, sex, and race (white, black, and other 
races) groups. Following the 1970 census, this method was 
used to produce the official estimates of coverage for the 
United States as a whole (U.S. Bureau of the Census, 1974). 
(However, no corrected figures were used in any programs; 
general revenue sharing and the Current Population Survey, 
for example, used estimates without adjustment.) The 
method of demographic analysis is considered by the census 
staff to be more effective than a postcensal sample survey for 
developing satisfactory estimates of net undercounts at the 
national level. Consequently, the demographic estimates of 
census coverage are envisioned as the official national esti- 
mates for 1980. 

The particular procedure used to estimate coverage for the 
various demographic subgroups, notably age groups, depends 
on the nature of available data and on timing requirements of 
the overall evaluation program. For groups under age 45 in 
1980, i.e., persons born after 1935, estimates of the corrected 
population will be developed from birth, death, and immi- 
gration statistics. For the population over age 65, aggregate 
Medicare data will provide the basis for coverage estimates. 
For the remaining ages, 45 to 64 years, the coverage esti- 
mates will be extensions of the estimates for ages 35 to 54 
in 1970; these were derived from analysis of previous cen- 
suses. Actual death statistics will be used to allow for mor- 
tality up to age 74. Official immigration statistics, supple- 
mented by estimates of other legal immigration, illegal 
immigration, and emigration, will be used to allow for net 

Different methods may be used for the same age groups 
in the preliminary and revised estimates because of the 

availability of different data. In some instances, it is not 
possible to specify the choice among alternatives for the 
revised estimates at this time. It should be noted that demo- 
graphic analysis has not proven successful in developing 
coverage estimates for the Hispanic population. Estimates 
of coverage for this group in 1980 are expected to be ob- 
tained from the match studies. 

Birth and death statistics. Registered births over several 
decades provide a direct basis for estimating the corrected 
numbers of persons in most age groups in 1980, covering 
a large majority of the total population. Statistics on regis- 
tered births are available (by race and sex) for all States 
since 1933. In addition, tests of birth registration complete- 
ness were conducted for 1940, 1950, and 1964-68 that 
provide correction factors for these years; factors for other 
years can be obtained by interpolation and extrapolation. 
These data will be used in estimating census coverage for the 
population under age 45 in 1980. 

One way of deriving U.S. totals of births corrected for 
underregistration is to aggregate the results for States (by 
race). The State estimates of births corrected for under- 
registration, for years since 1935 and for race groups, will 
serve also as basic data for deriving demographic estimates 
of coverage for States. Several projects have been undertaken 
to extend and improve this data base. If successful, this 
project would permit the estimation of coverage for States 
from birth statistics for ages 45 to 64 in 1980 also. This 
technique is believed to be greatly superior to that used to 
produce coverage estimates for ages 35 to 54 for States in 
1970 (U.S. Bureau of the Census, 1977). 

Current plans call for using registered deaths in the cal- 
culations with no correction for underregistration or mis- 
reporting of the characteristics of decedents. Alternative 
calculations will permit investigation of effects on coverage 
estimates of allowances for possible underregistration of 
deaths, particularly for infant deaths in earlier years. Simi- 
larly, effects of age misreporting on death certificates can be 

Immigration statistics. Data on legal immigrants admitted 
to the United States for 1935 to 1980 will be used in esti- 
mating the corrected population under age 45 in 1980. The 
Immigration and Naturalization Service (INS) publishes data 
on immigrants classified by age, sex, and country of origin. 
To be included in the immigration component of the esti- 
mates of expected population are other data items, some 
supplied by INS. These include net arrivals in the United 
States from Puerto Rico, net arrivals of civilian citizens, and 
parolees. The quality of these data is under investigation. 

One component of the expected population in 1980 for 
which data are lacking is illegal immigration. Obviously, 
because of the nature of this population, an accurate estimate 
of its size will be quite difficult to make. The range of 
existing estimates for the illegal population in recent years 


is quite broad. No satisfactory estimates of either the net 
flow or the illegal resident population in the United States 
are available. A variety of estimation techniques have been 
used to try to establish the number, but the true number 
remains unknown. 

The Census Bureau has undertaken an evaluation of the 
existing studies of illegal immigration to the United States 
and is investigating various approaches to the estimation 
problem in addition to those previously employed. On the 
basis of the existing work and its own research, the Bureau 
expects to develop a range of working estimates of the illegal 
alien population to be included in the estimate of the ex- 
pected population in 1980. 

Movement out of the United States (by both citizens 
and aliens) is another component for which no satisfactory 
data exist. The methods used in the past to develop emigra- 
tion estimates present problems in terms of timeliness, 
coverage, consistency and accuracy of the estimates over 
time, and the scope of the assumptions required [4] . Con- 
sequently, the Census Bureau has been considering a test of 
network (multiplicity) sampling in conjunction with the 
Current Population Survey in late 1980 to investigate the 
feasibility of obtaining information on emigration from the 
United States. 

With the multiplicity technique, respondents are asked 
whether certain specified relatives have emigrated from the 
United States. Persons with such relatives are asked further 
questions to obtain information on the emigrants and for 
weighting the sample responses. The results of the multi- 
plicity survey will not be available for the preliminary esti- 
mates of undercount so that indirect estimation techniques, 
such as those based on INS alien registration data and 
Social Security data, will have to be employed. Should the 
test of the multiplicity approach prove successful, it may 
be possible to develop emigration estimates on the basis of 
a full-scale survey conducted some time in 1981 . 

Medicare statistics. The corrected population aged 65 
years and over for age, race, and sex categories will be de- 
veloped from aggregated Medicare statistics. Those statistics 
will be adjusted for estimated underenrollment and then 
compared with census counts to obtain undercount esti- 
mates. The factors for adjusting the Medicare data are to be 
obtained as a by-product of the proposed match studies. 

In the meantime, two alternatives are available for pro- 
viding preliminary coverage estimates for the population 65 
and over in age-sex-race categories. One alternative would 
involve carrying forward the corrected population aged 55 
and over in 1970 with estimates of deaths and net inter- 
national migration [8] . Another would utilize preliminary 
aggregated Medicare data for 1980. These data would be 
corrected for underenrollment on the basis of correction 
factors developed from a test of record-matching tech- 
niques involving CPS, IRS, and Medicare data for February 

Preliminary and Revised Estimates 

The Census Bureau is planning to release at least two 
national coverage estimates based on demographic analysis: 
(1) Preliminary estimates to meet the demand for estimates 
at the earliest possible date; and (2) revised estimates to 
incorporate data and research findings that become available 
later. Preliminary estimates of coverage for the total popula- 
tion only will be released about January 1 , 1981 . Preliminary 
coverage estimates for age, sex, and race categories should be 
available by mid-1981. Because of the need to analyze the 
census data fully (particularly the racial categories) and to 
complete ongoing research, the revised "demographic" esti- 
mates of coverage will not be available until mid-1982. 

The preliminary and revised estimates will use different 
data and methods to estimate various components of the 
corrected population in 1980. For ages 45 to 64, the pre- 
liminary estimates will be extensions of the estimates for 
ages 35 to 54 in 1970, based on analysis of previous census 
data [8] . The revised estimates for these ages may be based 
on survivors of births as corrected at the State level on the 
basis of research now being conducted. Other revisions 
involve replacing provisional data on births, deaths, and 
immigration for 1979 and 1980 with final data. Because of 
the uncertainty involved in estimating illegal immigration 
and the availability of new data, it is very likely that the 
estimate of the illegal alien population will be modified in 
the revised coverage estimates. Furthermore, in both in- 
stances, a range of estimates may be employed. 

Problems in obtaining the required data prevent com- 
pletion before 1983 of revised estimates for the population 
over 65. The Medicare files for April 1, 1980, will not be 
complete until the end of 1980. More importantly, however, 
since a match involving the census or the coverage evaluation 
survey and administrative records is required to develop the 
revised factors for adjusting the Medicare data, the estimates 
cannot be completed until the match results are known, 
analyzed, and incorporated into the estimation procedure, 
that is, in 1983. 

Quality of the "Demographic" Estimates 

The present plans of the Census Bureau rely primarily on 
the use of demographic analysis for deriving preferred esti- 
mates of underenumeration at the national level. When the 
1970 estimated undercount was announced as 5.3 million, 
a range of error extending from 4.8 to 5.8 million was also 
offered. These figures did not represent a statistical confi- 
dence interval but rather the possible effect of errors in the 
components. In his doctoral dissertation, Fay [1] estimated 
a standard deviation in the undercount estimate of 0.5 to 
0.9 million (but it should be noted that his preferred esti- 
mate of the undercount was 6.1 million) 

The development of "demographic" estimates of under- 
coverage requires the combination of data from a number of 


sources. Demographic analysis includes correcting these data 
for known errors. Even with these corrections, great un- 
certainty remains about the accuracy of specific components, 
in particular emigration and net illegal immigration. The 
Census Bureau is currently investigating methods for esti- 
mating variances for the demographic estimates. One 
approach would involve combining subjective confidence 
intervals for the components with conventional statistical 
techniques. By its very nature, a census undercount can be 
elusive to estimate exactly, but demographic analysis seems 
to give reasonable and reliable results. 

Estimates for States 

Demographic Analysis 

A "demographic" approach to estimating the coverage of 
State populations was attempted for 1970 [10] . The basic 
approach involved first estimating the coverage of the popu- 
lation born in each State (for ages under 35 in 1970) by 
comparing survivors of births with the census data on the 
population born in each State. Then, several different pro- 
cedures and assumptions were used to convert the coverage 
estimates for the population born in each State into esti- 
mates for the population living in each State. The procedures 
and assumptions involved a number of parameters for which 
no data were available. Accordingly, a number of alternative 
sets of State coverage estimates for the population under 35 
were derived rather than a single preferred set. Other un- 
certainties in the estimation procedure, particularly the lack 
of any reliable methodology for ages 35 to 64, led to other 
alternative estimates and to the characterization of the 
estimates as "develoomental." 

For 1980, we plan again to produce "demographic" esti- 
mates of census coverage for States. Several developments 
should remove some of the uncertainties in the estimation 
methods to the point where it may be possible to derive a 
useful set of estimates. The match studies, if successful, 
would provide estimates of the relative coverage of lifetime 
interstate migrants and nonmigrants. This parameter is 
crucial to the estimation procedure; values had to be assumed 
for 1970. The information from the match study should 
permit a simplification of the method, including elimination 
of the separate calculation of coverage estimates for States 
of birth and the resulting necessity of converting those 
estimates to represent States of residence. 

The data on births, which extend back to 1935, will cover 
a larger proportion of the population in 1980. In addition, as 
previously mentioned, research is under way to extend the 
data on corrected births back to 1925 or 1915. If successful, 
these projects would virtually eliminate the need for ratio 
estimates in the middle age range. We also hope to remove 
some of the uncertainty in the coverage estimates for ages 
65 and over with the results of studies being conducted at 
the Bureau of the Census on the accuracy of residence re- 
porting in the Medicare files. 

The "demographic" estimates of coverage for States are 
based on the place-of-birth data collected on the sample 
form. Since these data will not be available before 1982 and 
considerable analytic work is necessary, the coverage esti- 
mates will not be completed before late 1983. The quality 
of the data on State of birth has apparently been deterior- 
ating in the last three censuses, e.g., higher nonresponse 
rates and evidence of greater misreporting. Should this trend 
continue, there will be serious problems with any method- 
ology based on State of birth. 

Reinterviews and Record Checks 

The Bureau of the Census has undertaken a considerable 
amount of research on the feasibility of conducting a sample 
survey as soon as possible after the census enumeration has 
been completed to meet the demand for estimates of census 
coverage for States and various local areas. Coverage error 
would be estimated by matching persons listed in the survey 
on a one-to-one basis with the census listing of names. The 
survey would be designed to provide reliable estimates of 
net coverage error at the State level for the total population, 
and at broader geographic levels, such as regions or divisions, 
for the principal races and the Hispanic population. If the 
matching can be successfully performed, the survey would 
also provide estimates of net coverage error for the total 
population of the 26 largest SMSA's, their central cities, 
and six SMSA's and their central cities which have propor- 
tionally large minority populations. The survey would also 
provide estimates for various socioeconomic categories at 
the national and regional level. (Demographic analysis, as 
such, cannot be used to provide estimates of coverage error 
for socioeconomic categories or for substate areas.) 

Postenumeration surveys were conducted as part of the 
1950 and 1960 census evaluation programs. These studies 
were not successful in providing accurate estimates of the 
undercount for certain subgroups of the population, how- 
ever. Other evidence, including estimates derived by demo- 
graphic analysis and the implausibility of the sex ratios 
shown by the PES, clearly indicated that the PES estimates 
of underenumeration were seriously biased downward. This 
bias was especially evident for black males aged 15 to 59, 
for whom the PES yielded an estimate of the undercount 
that was approximately one-half the estimate provided by 
demographic analysis. Erroneous results such as these are 
apparently caused by the problem called "correlation bias." 
This bias stems from the fact that persons enumerated in the 
census tend to be enumerated in the survey at a greater rela- 
tive rate than persons missed in the census; that is, persons 
missed in the census tend not to be reported in the survey for 
the same reasons that they were missed in the census. 

Sample survey methodology. A considerable amount of 
research has been conducted as part of the census pretests in 
Oakland, Calif, and Richmond, Va., to develop a method- 


ology for a special survey to estimate census underenumera- 
tion. This research has been designed to attempt to overcome 
some of the problems already discussed. The two post- 
enumeration surveys from the census pretests are still under- 
going analysis. Two aspects of this research may have a large 
impact on our planning: The difficulty of matching and the 
greater than anticipated clustering of errors, which will have 
a direct impact on the precision of any estimates obtained. 

The problems of matching have led to the development 
of two questionnaires combined into one form. The first 
questionnaire, called Procedure A, lists all persons at the 
sample housing unit at the time of the census. Procedure B 
lists all persons who live at the sample address and deter- 
mines where each person was living at the time of the census. 

Both procedures can be used to obtain estimates of the 
total number of persons. However, the results obtained from 
either procedure are highly dependent on how well matching 
can be done from the survey to the census. Procedure A has 
the advantage of providing very good information on ad- 
dresses but suffers from relatively poor information on 
characteristics of persons who have moved in the several 
months between census day and the survey. Procedure B, 
on the other hand, provides good data on characteristics of 
all respondents, but relatively poor information on census 
day addresses for persons who have moved since the census. 
Both procedures rely heavily on being able to take what in- 
formation is available— both the address where each person 
lived on census day and demographic data on each person— 
and match back to the census. The two pretests and the two 
earlier postenumeration surveys indicate, that, despite our 
best efforts, our ability to match successfully and correctly 
is suspect. 

The problems with matching are further exacerbated by 
the fact that difficulty in matching seems to be related to 
the same characteristics that underlie the undercount. 
Matching seems especially difficult in rural areas and in areas 
with high concentrations of small multiunit buildings (10 
units or less in a structure). Accordingly, apparently high 
undercoverage rates in largely rural States or poor urban 
areas may be due to differential rates of success in matching 
and not so much to true differences in the undercoverage 
of the population. Alternative procedures for matching and 
controlling these differences are still being tested and com- 

The second major discovery from the pretests was the 
high degree of clustering of census misses and erroneous 
enumerations. Within each State or SMSA, a sample of blocks 
is selected (with a possible intermediate stage of county 
selection). The blocks are completely listed, and housing 
units within the blocks are selected as the final sampling 
stage, with all persons in a housing unit listed for the PES. 
The correlation within households for census errors and the 
correlation between housing units have been found in the 
pretests to be higher than anticipated. These correlations 
lead to larger than anticipated design effects in the variance 

estimates for underenumeration rates. T ne problems in 
matching mentioned earlier combined with this recent 
discovery of the large design effects in the pretest make it 
difficult to predict exactly how accurate the PES estimates 
will be for States or substate areas. The sample allocation 
outlined above should yield the most reliable estimates of 
the total corrected population possible for a sample of 
250,000 housing units, but a question remains as to how 
reliable these estimates would be. 

Administrative record match (ARM). In addition to 
conducting a sample survey to estimate census coverage, the 
Census Bureau is considering the use of additional data from 
"independent" administrative files to improve the estimates 
of coverage error. To the extent that satisfactory matching 
of the administrative files, the census, and a survey can be 
achieved without impairing independence of the sample data, 
more accurate estimates of coverage error should be pro- 
duced than in 1950 or 1960. Two administrative files are 
being considered for this purpose: the Internal Revenue 
Service (IRS) tax return file for persons aged 17 to 64 years 
of age and the Medicare file for persons 65 or older. 

The feasibility of using these files is currently being 
tested. The February 1978 Current Population Survey, is 
being used as a proxy for a postenumeration survey in this 
test. Data were collected to facilitate a match with the ad- 
ministrative files. Dual systems estimates from this match 
for the total "corrected" population for February 1978 will 
be compared with "demographic" estimates of the total 
corrected population. If the problems of matching to the 
administrative files prove to be surmountable and the dual 
systems estimates of total population are reasonable, cover- 
age rates based on the administrative records could be used, 
along with "demographic" estimates, to adjust the survey 
estimates of coverage error in the 1980 census. 

Combination of Estimates of Census Coverage 

Demographic and survey-ARM estimates: Nation and 
States. Once estimates of coverage are available from the 
survey and the administrative record match, these estimates 
can be combined with the "demographic" estimates. Esti- 
mates from match studies can be derived for detailed demo- 
graphic, socioeconomic, or geographic categories. The 
"demographic" estimates, however, will be suitable only for 
national or State estimates of various demographic categories. 

The assumptions underlying the combination of data from 
the different sources are that the "demographic" estimates 
are more accurate than the estimates based on matching 
methods at the national level, but the matching studies are 
better for measuring differences between geographic areas 
and are the only basis for measuring coverage differences 
between socioeconomic subgroups in the population. Esti- 
mates of coverage for State populations are expected to be 
available from demographic analysis and from the match 


studies. At this time, the relative quality of the two types 
of estimates cannot be known. Should either prove to be 
unacceptable, the other will be used alone. If, however, both 
sets of estimates are acceptable, it would seem desirable to 
combine them, taking the variances of the estimates into 
account in the weighting procedures. Variance estimates 
would be available for the survey estimates as part of the 
evaluation of the results. Research is now in progress for 
developing estimates of the variance of demographic esti- 
mates of coverage for States. 

A possible procedure for combining the "demographic" 
estimates and the match study estimates, which takes ad- 
vantage of the better features of both types of estimates, 
would involve using the demographic estimates, particularly 
the national estimates, as "controls" or marginal totals for 
the estimates from the survey. The final product of the 
estimation procedure would be sets of tables produced from 
the results of the survey (or a merger of the survey and 
demographic analysis) "raked" to marginal totals that 
correspond to the analytic estimates for age, race, and sex 
groups nationally. The resulting estimates would be the 
adjusted counts for States, large SMSA's, and large cities. 
They would not be available before mid-1983. 

Estimates for Sub-State Areas 

If census data are to be adjusted for allocation of funds, 
there is a need for estimates of census coverage for all cities, 
counties, and other local units of government. Since develop- 
ing reliable coverage estimates for States and large SMSA's 
requires a very large sample, the Census Bureau obviously 
cannot afford a survey to develop coverage estimates for 
smaller areas. Accordingly, the Bureau is conducting research 
into other techniques for producing coverage estimates for 
sub- State areas. At this point, it appears that any estimates 
produced will be experimental in nature. Techniques for 
validating sub-State estimates of census coverage have not 
been developed. 

Regression and Synthetic Estimation 

Two alternative procedures, regression and synthetic 
estimation, are being considered for obtaining estimates 
of census coverage for smaller areas. The postcensal survey 
is being designed to produce data that could be utilized by 
these procedures. Broadly speaking, the sample is being de- 
signed to provide reliable estimates of the corrected popula- 
tion of specified minorities and specified socioeconomic 
categories for areas broader than States (e.g., regions, or 
urban-rural populations). One application of regression 
analysis to estimating net undercount for counties might 
start with the counties that are in the sample. Regression 
models for these areas would be developed in conjunction 
with data collected in the survey and the census. These 
models would then be applied to counties not in the sample. 

Our research in this area involves the determination as to 
what alternative regression models might be used, what 
variables are important to the model, and what transforma- 
tions on the data might be needed. Other research is being 
conducted into the possibility of using smaller areas (e.g., 
blocks) or larger areas (e.g., county groups) as the basis for 
regression models. 

Synthetic estimation of census coverage involves applying 
coverage rates for specific segments of the population (e.g., 
racial, socioeconomic, or residence categories) at a given 
geographic level (e.g., the United States) to the population 
at some subordinate level (e.g., States). For example, syn- 
thetic coverage rates for counties might be derived by apply- 
ing regional coverage rates for race/Hispanic origin and in- 
come classes to county populations disaggregated by income 
and race/Hispanic origin. Synthetic estimation has been used 
in a variety of applications [2, 3, 5] , including illustrative 
examples for evaluating the effects of adjusting census data 
for undercoverage [9] . However, comparison of "demo- 
graphic" estimates of coverage and simple synthetic estimates 
(based on age, race, and sex only) for States indicates that 
synthetic estimates not only fail to capture the full range of 
variation in coverage rates, but also differ greatly from 
demographic estimates for some States. 

The Census Bureau is conducting research in the area of 
synthetic estimation. As noted, the survey might provide 
coverage rates for categories other than the basic demo- 
graphic ones and for areas smaller than the entire country. 
It is likely that synthetic estimates based on such categories 
would be more accurate than the simple synthetic estimates 
employing race only, or race, age, and sex only, as compo- 
nents. Further, the synthetic estimates for areas below the 
State level could be adjusted to the combined match study- 
"demographic" totals for States. Alternative synthetic tech- 
niques are being compared as well as alternative levels of 
aggregation. For any synthetic estimation procedure, the 
results of the match studies should prove quite helpful for 
validation of the estimates. 

Simple synthetic estimates based on race only could be 
produced as early as mid-1981. Addition of Hispanic origin 
to the categories requires match study estimates, which are 
not to become available even in preliminary form until mid- 
1982. More refined synthetic and regression estimation 
would require combined demographic v analysis-match study 
data, which will not be available until late 1983. Again, 
synthetic and regression approaches to estimation of census 
coverage of sub-State areas are, at this time, considered ex- 
perimental. The resulting estimates may not be sufficiently 
accurate to warrant their use. 


The amount of public money disbursed to States, cities, 
and other local areas on the basis of population data has 
become substantial. For example, since the inception of the 


general revenue sharing program in 1972, over $42 billion 
has been distributed in a 7-year period (1972-78) to over 
39,000 governmental units. With this amount of money at 
stake, the considerable interest in the effect of census under- 
coverage, particularly local area differences, on the distribu- 
tion of funds is not surprising. 

Statistical Considerations in Equity 

Concerns regarding the effect of census errors are usually 
stated in terms of equity or fairness of the fund allocations 
Notions of equity are fundamental to our system of political 
representation, taxation, and public expenditure, yet 
"equity" is a difficult concept to define. If it is defined as 
"everyone receiving a fair share," it is then necessary to 
define what is "fair." Generally what is fair is not necessarily 
obvious and consequently equity can be difficult to measure. 

Equity in allocation of resources can be characterized in 
terms of three separate aspects: need, effort, and capacity. 
Need, in this typology, is the underlying requirement for 
assistance, particularly in terms of public services, and may 
be represented by population. Effort measures the resources 
already being contributed to meeting the need; sometimes 
effort is measured relative to total resources. Capacity refers 
to the resources potentially available to meet the need and 
may be represented by per capita income. Not all aspects of 
equity necessarily come into play in all situations; determi- 
nation of political representation, amount of taxation, and 
allotment of revenues all require somewhat different char- 
acterization of equity. 

Defining equity in revenue allocation (or taxation) is not 
the duty of the Census Bureau or any other statistical agency. 
The meaning of equity must normally be taken as the intent 
of Congress as embodied in the law. Congressional intent, 
as embodied in Federal grant-in-aid formulas, allocates 
resources on the basis of need, capacity, and effort— the 
factors previously mentioned. Each factor is assumed to be 
observable, in some sense, at the State or local level; then 
each one must be defined operationally in terms of some 
statistical measure. Congress generally leaves to the Federal 
statistical agency the choice of data source or of estimation 
procedure for generating the required data series. Thus, 
equity considerations relate to how well census data (or 
other data generated by the Census Bureau) produce allo- 
cations that correspond to congressional intent. 

In connection with data used for allocation of funds, 
factors that contribute to departures from equity include 
bias, variance, cost, timeliness, and appropriateness. The 
first two factors are familiar to statisticians; cost is all too 
familiar to everyone. Timeliness denotes the extent to which 
the time frame of the data is the same as required by the 
formula. Appropriateness can be defined as the extent to 
which the concept being used (no matter how well measured) 
corresponds to the intent of the law. Clearly the distinction 
between bias and inappropriateness is arbitrary, depending 
largely upon what the analyst believes is being measured. 

Distinguishing between bias and variance depends on the 
underlying model. For example, if the probability of being 
counted in the census is assumed to be constant, all differ- 
ential undercount of areas is variance. Alternatively, if being 
enumerated is assumed to depend on race, then failure to 
correct for differential undercount between whites and 
blacks leaves "bias" in the counts; any remaining differ- 
ential undercount beyond that accounted for by differences 
in racial composition is "variance." Extensions of such 
models bring more information into the bias category. 
Equity considerations might require that known biases in 
the data be corrected. However, "known" biases tend to be 
those that are measurable, not necessarily those that are 
largest or most important. Furthermore, correcting a 
"known" bias with an estimate may introduce considerable 
"variance," with the resulting failure to correct the erroneous 
fund allocation. 

Cost and timeliness have a place in equity discussions 
also. Reduced error and thus increased equity in fund allo- 
cations has a cost. If costs are conceived of as coming out 
of the pool of funds to be allocated, then at some point it 
is to no one's advantage to spend more money to reduce 
error. Well before that point, it will not be to the advantage 
of most jurisdictions to spend more to reduce statistical 
error. The timeliness of adjustments can affect equity. For 
some adjustment procedures, the required estimates will 
take several years to complete. In the meantime, fund allo- 
cations will have to be based on estimates derived from 
unadjusted census data or data adjusted with preliminary 

Impact of Census Errors on Fund Allocations 

In assessing the effects of data errors on fund allocations, 
two dimensions must be considered. First, we need to con- 
sider whether the allocation is on a per capita basis or ap- 
portioned on a competitive basis and, second, we need to 
consider whether the funds are allocated on the basis of 
total population, some segment of the population, or some 
factor or factors in addition to population, such as per 
capita income. 

Capitation grants, or funds distributed on a per capita 
basis, allocate a fixed amount of money per eligible person 
to each subdivision: 

Dj = KN f 

where D -is the amount allocated to the /th subdivision, 

K is the fixed amount per eligible, and N j\% the eligible 

population in the /th subdivision. For funds distributed on 
a per capita basis, the total amount distributed depends on 
the size of the particular population, without reference to 
the population of other areas. Any data error obviously 
affects the distribution and, if there is an undercount of 
population, there is an underallocation of funds. 


Typically, however, Federal funding programs have an 
apportionment feature; that is, a preestablished sum of 
money is distributed to a class of governmental units such 
as States or States and their political subdivisions. One 
version of proportionate allocation distributes funds to 
subdivisions in proportion to the eligible population: 

D : = D x (A/,- 

S Ns 

where D is the fixed amount to be allocated. If funds are 
distributed solely on the basis of population under such an 
apportionment formula and if the population of the coordi- 
nate governmental units is adjusted for underenumeration 
by a common percentage, however large, the funds appor- 
tioned to each governmental unit clearly would not be 
affected by the adjustment. It should be recognized, there- 
fore, that shifts in the funds apportioned to governmental 
units (State or local) depend on the variation in the under- 
enumeration rates among the areas. 

Clearly, under competitive apportionment, discussions of 
equity should not relate to the accuracy of the estimate for a 
particular area, but rather to the accuracy of the estimates for 
all areas in a set; that is, how well does the estimated distri- 
bution reflect the unknown "true" distribution. There are 
many ways to measure these differences, for example, sum 
of absolute errors, sum of squared errors, sum of underalloca- 
tions, etc. The choice of a measure of inequity is itself a 
value judgment and would imply certain notions of equity. 
The papers by Spencer and Fellegi presented at this confer- 
ence discuss some of these issues. 

When the formula for distributing funds involves factors 
in addition to population, the adequacy of the data on these 
additional factors (e.g., per capita income) also has a bearing 
on the adequacy of the allocations. The importance of these 
other factors, vis-a-vis population, is often overlooked in the 
assessment of the impact of data errors on the equity of the 
distribution of public funds. 

The Census Bureau has carried out several studies that 
assess illustratively the effect of census underenumeration 
on the distribution of funds under various public programs. 
In an earlier study [9] , population counts for States cor- 
rected by synthetic procedures were employed to illustrate 
the effect of underenumeration on apportionment formulas. 
It was found that under an apportionment rule based on 
population size alone, the size and variation of the percent- 
age shifts in funds allocated to States resulting from a cor- 
rection of census counts would be far less than the population 
undercount rates. It was also found that more States would 
lose money than would gain money under the changed ap- 
portionment. The 1975 study also examined the effect of 
underenumeration and the underreporting of income on the 
distribution of general revenue sharing funds at the State 

A later companion study [6] assessed the effects of data 
corrections on general revenue-sharing allocations among 

the counties and local areas in two States, New Jersey and 
Maryland. This study clearly demonstrates that, under the 
general revenue sharing formula, the adjustment for popu- 
lation undercount in the population factor alone of the 
formula results in little change in funds apportioned. 1 The 
effect of the population adjustment is dampened consider- 
ably as a result of the apportionment feature of the formula. 
The effect of the population adjustment is even less if popu- 
lation in the population factor and population in the tax 
effort factor are both adjusted, as is more reasonable. 2 The 
funds apportioned would be altered to a much greater extent, 
especially at the sub-State levels, if both population and 
income were fully adjusted for understatement. 3 Income, 
not population, then emerges as the dominant element in 
affecting the revenue-sharing allocations when the data are 
corrected for understatement of the components. 

The results of the study also reflect the tendency for most 
areas to lose money under an adjustment, whether of popu- 
lation only or of population and income combined. Most 
areas tend to move in the direction of their "proper" allot- 
ment (i.e., the allotment that would be received where all 
factors are adjusted) even when population alone is adjusted, 
but for most areas this means a loss of funds rather than a 
gain. When population rather than areas is examined, this 
tendency generally holds; that is, under most adjustments, 
the majority of people live in areas that would lose money. 
The situation is less clear for the black population considered 
separately. For some of the sets of adjusted data, the ma- 
jority of blacks live in areas that would gain money. 

The finding that the majority of States and areas (in New 
Jersey and Maryland) would lose funds if the population and 
income factors are corrected in the revenue-sharing formula 
could have significant political implications. If this pattern 
can be generalized to the distribution of funds in all States, 
it may be expected that proposals to adjust for data errors 
used in revenue sharing will be strongly opposed by the 
governmental units (which form a majority) that stand to 
lose funds by the change. 

'The population factor was adjusted for undercoverage by the 
synthetic method. The basic 3-factor revenue sharing formula that 
determines the allocation amounts for localities may be represented 
in simple form by: 

Px (P/l) x (T/l) =P x (P/l) x{T-[Px (1/P)] } = (P/l) 2 xT 

where P represents population, I aggregate money income, (P/l) 
per capita income, T net adjusted taxes, and (T/l) tax effort. 

2 The population factor and the population element in the tax 
effort factor were adjusted for undercoverage by the synthetic 
method. Population cannot be adjusted in the per capita income 
factor unless the aggregate income factor is also adjusted for the 
income of the persons added by the adjustment, i.e., the per capita 
income figure as estimated is more accurate than a figure derived by 
adjusting the population component only. 

3 The population factor and the population element in the tax 
effort factor were adjusted for undercoverage by the synthetic 
method; the per capita income and tax effort factors were adjusted 
for income underreporting by substituting BEA estimated income 
for census data. 


It is not easy to define or achieve equity of allocations 
in the face of data errors. With less than perfect data, the 
allocation of funds will be less than optimal. The function of 
the statistician is not necessarily to provide error-free data, 
but rather to try to identify the largest errors and try to 
control them. However, when discussions of equity focus 
on preexisting allocation formulas for large sums of money, 
the discussions necessarily move beyond the statistical 
realm to encompass the political. Therefore, papers presented 
at this conference cover both realms. 


1. Fay, Robert E. ill. "Statistical Considerations in Esti- 
mating the Current Population of the United States." 
Unpublished Ph.D. dissertation, Department of Statis- 
tics, University of Chicago, 1974. 

2. Gonzales, Maria Elena, and Hoza, Christine. "Small Area 
Estimation with Application to Unemployment and 
Housing Estimates." Journal of the American Statistical 
Association, 361 (March 1978), 7-15. 

3. National Center for Health Statistics. "Small Area Esti- 
mation: An Empirical Comparison of Conventional and 
Synthetic Estimators for States." Data Evaluation and 
Methods Research, Series 2, No. 82, December 1979. 

4. Passel, Jeffrey S., and Marks, Jennifer P. "Estimating 
Emigration from the United States— A Review of Data 
and Methods." Paper presented at the annual meeting 
of the Population Association of America, Philadelphia, 

5. Purcell, Noel J. "Efficient Estimation for Small Domains: 
A Categorical Data Analysis Approach." Unpublished 

Ph.D. dissertation, Biostatistics Department, University 
of Michigan. 

6. Robinson, J. Gregory, and Siegel, Jacob S. "Illustrative 
Assessment of the Impact of Census Underenumeration 
and Income Underreporting on Revenue Sharing Alloca- 
tions at the Local Level." Paper presented at the annual 
meeting of the American Statistical Association, Wash- 
ington, D.C., 1979. 

7. U.S. Department of Commerce, Bureau of the Census. 
"The Medicare Record Check: An Evaluation of the 
Coverage of Persons 65 Years of Age and Over in the 
1970 Census." Census of Population and Housing: 
1970. Evaluation and Research Program, PHC(E)-7. 

8. "Estimates of Coverage of Population by Sex, 

Race, and Age: Demographic Analysis." Census of 
Population and Housing: 1970. Evaluation and Research 
Program, PHC(E)-4. February 1974. 

9. "Coverage of the Population in the 1970 Census 

and Some Implications for Public Programs." Current 
Population Reports, Series P-23, No. 56. Washington, 
D.C.: U.S. Government Printing Office, 1975. 

10. "Developmental Estimates of the Coverage of 

the Population of States in the 1970 Census: Demo- 
graphic Analysis." Current Population Reports, Series 
P-23, No. 65. Washington, D.C.: U.S. Government Print- 
ing Office, 1977. 

11. "Coverage of the Hispanic Population of the 

United States in the 1970 Census: A Methodological 
Analysis." Current Population Reports, Series P-23, No. 
82. Washington, D.C.: U.S. Government Printing Office, 

j%V^ Pro and Con 

Facing the Fact of 
Census Incompleteness 

Nathan Keyfitz 

Harvard University 


The way censuses are taken has a bearing on the 
undercount and hence on the use of the census for allocation 
and apportionment. Much of what follows is a discussion of 
census practice. (More extended discussion is provided in 
National Research Council [4] and Keyfitz [3] .) The con- 
cluding section shows what can be done about it; i.e., it shows 
the several ways that the parties using the census for alloca- 
tion of funds can come to agreement. The list of options 
given is intended to be exhaustive in the sense that any 
proposal for handling the undercount would fit under one 
of the three headings. This paper expresses no preference 
among the options, but attempts to set forth the advantages 
and drawbacks of each. The reader who is concerned only 
with action on the undercount can proceed directly to the 
concluding section and see where his preferences fall. 


The most easily written part of articles on census 
completeness is an exhortation to the Bureau of the Census 
to do better, to take the 1980 census exactly with no errors. 
The writers of such articles are often L.iaware that over the 
last 40 years the Bureau of the Census has pioneered in the 
reduction of census error. The errors that remain are not due 
to negligence or ignorance on the part of Bureau personnel, 
who know more about errors of counting than any other 
group in the world. The difficulties with which this con- 
ference is concerned will not be removed by change of 
management or adoption of any obvious new methods. It 
will take a very clever journalist to see more deeply into the 
problem of completeness than have the series of brilliant 
census leaders of the past 40 years. 

Yet the nagging thought persists that the census really 
ought not to be appreciably incomplete. If 200 passengers in 
an airplane can be counted exactly, with zero error, why not 
200 million? The census is taken by dividing the country on 
maps into small areas containing an average of 1,000 persons 
and assigning the responsibility for each area to one 
enumerator. Zero error in each of 220,000 areas would mean 
zero error for the country as a whole. There are a number of 
things wrong with this commonsense view, and until it is 
disposed of the public is going to be impatient with 
discussions of the kind for which this conference has been 
called. My main point concerns the penumbra of irremovable 
arbitrariness around any permissible way of taking the census. 

Anyone can think of ways of taking a more nearly 
complete census if some of the constraints can be relaxed. If 
people could be required to stay home for one day, or until 
the enumerator calls; if people could be given a button 
indicating that they had been enumerated, and required to 
wear it; if this or some other means of showing that they had 
been enumerated, perhaps a card, were required for trans- 
acting such business as cashing a paycheck, drawing social 
security, being attended by a doctor, etc., they would have 
an immediate interest in being included in the census. Such 
devices have not been found acceptable in the United States. 
They might be applied in totalitarian societies, though there 
it turns out that other inefficiencies intervene; censuses taken 
in the U.S.S.R. have been bad, and one even had to be 
abandoned before publication. 

Public impatience tends to be proportional to the 
quantity of funds distributed on the basis of the census 
count. Yet error is an integral part of counting; the 
difference between a precise survey and a poor one is in the 
amount of error, not in that one contains errors and the 
other doesn't. 


One of the lessons of statistics is that the several sources 
of error in any survey have to be seen in relation to one 
another. With large unremovable sources of error it may not 
pay to remove smaller sources. To give an example, where 
population and income are obtained from sample surveys and 
allocation is to be made on the product, suppose that income 
per head is subject to a standard error of 10 percent and 
population to a standard error of 1 percent, these errors 
being independent of one another. If we could do nothing 
about the error of income, but by doubling the expenditure 
on the census we could reduce the population error to 0.7 
percent, then without the doubling of expenditure we have 
an overall standard error of 10.05 percent; with the double 
expenditure on the census, we have a standard error of 
10.025 percent. A 100-percent increase in expenditure 
increases the accuracy of the required product by 2.5 parts 
per thousand. 


One source of definitional uncertainty is whether "popu- 
lation" includes illegal aliens. The issue is being bitterly 
debated to the point of putting the census itself in jeopardy 



(New York Times, Dec. 21, 1979; Feb. 7, 1980). The 
Constitution, as modified by the 14th Amendment, orders 
the counting of "the whole number of persons in each State, 
excluding Indians not taxed." This could well be taken as 
including illegal aliens, but a lawsuit has been entered suing 
for their omission from the census. Yet the line between 
legal and illegal immigrants has become steadily less clear in 
recent years; both obtain substantial social services and both 
pay sales and other taxes. The question whether people 
should be here ought to be addressed to the Immigration and 
Naturalization Service; the census is only concerned with 
whether they are here. 

The matter is not trivial for the State and cities that have 
large numbers of them. In the apportionment of the House 
of Representatives, for instance, it is said ir press reports 
(though in the nature of the case no one knows) that if illegal 
aliens are counted, California would gain six extra seats and 
New York would gain three. Kansas, Washington, and a 
dozen other States would correspondingly lose one repre- 
sentative each, the House total being fixed at 435. New York 
City would gain 5 percent of its allocation of Federal funds 
if the illegals are counted. These numbers are clearly ex- 
aggerated and, in any case, the illegals would be subject to 
gross omission no matter how the census set the definition. 

It is hard to imagine the question of illegal aliens arising 
when the Constitution was framed or amended. Only, now 
when redistributive legislation has made them important, the 
matter is viewed with deadly seriousness. 

Here is an example (others will arise later) of what may be 
called the hardening of expectations. When Congress first 
offered to pay for school lunches, say at so much per year 
per child in school, no one was likely to ask whether children 
here illegally would partake. But once the program has been 
going for some time and is thoroughly incorporated in the 
community's receipts and expenditures, attention comes to 
be focused very intensively on such marginal questions. 

Though undocumented aliens are the largest, and legally 
the most interesting, of the groups that give rise to dispute 
on the part of those who will benefit from a particular 
definition of population, there are a host of other points on 
which census practice could be challenged once the door is 
opened. College students "re to be counted as a resident of 
the college community where they study, not of their 
parents' home, even though they may be with their parents 
when the enumerator calls. On the other hand, children in 
a residential secondary school are counted in the household 
of their parents, irrespective of where they happen to be at 
the time of the census. A town with a large secondary 
boarding school could well sue to have the students con- 
sidered as residents. 

Since there is no general definition of resident, the census 
has to decide where to draw its many boundary lines. The 
lines should be sharp and objective on the one hand and 
suited to the concept of "usual residence" on the other. 
These two considerations may conflict. In many instances 

the sharp definition is not appropriate to the use of the 
results. Persons who have more than one home and divide 
their time between them are to be listed where they spend 
the largest part of the calendar year, according to the census 
instructions. It would be sharper to put them where they 
spend the largest part of the current week or where they 
are actually found at the time of enumeration, but this 
would be less in accord with the objective of finding the 
usual population. 

To the arbitrariness incorporated in the census definitions 
must be added the errors in implementation by the enu- 
merator. A census is an intricate affair hedged about by 
arbitrary definitions, enumerated by people who cannot but 
add their own errors in transcribing the (not always exact) 
information to those provided them by the respondents. 


Beyond all questions of definition is the matter of those 
who should have been included in the census and who were 
just not caught. About half of these were in households that 
were not reported on any list and not known to the post 
office. The other half were members of a known household 
but were somehow omitted from its census questionnaire, 
typically, because the person responding for the household 
failed to report the individual. Either there was no evidence 
of their existence such as would put the enumerator on their 
trail, or there was evidence but the enumerator was delin- 
quent. An enumerator will make one, two, three, . . . call- 
backs for a person not at home on the first call, but a time 
comes when the most devoted enumerator gives up. 

The Bureau has shown exceptional skill in arranging 
publicity for the 1980 census. Newspapers and radio and 
television stations, especially including those run by minori- 
ties, have realized the importance of the census and have 
already given a great deal of free and favorable publicity. But 
all this, along with millions of dollars of expenditure on 
preparation, can be swamped by random unfavorable 
publicity on the eve of the census. Uncontrollable circum- 
stances, like the unwillingness of ecclesiastical authorities to 
reassure their constituents on the confidentiality of the 
census form, can work against completeness. 

Aside from all this, the quality and completeness of 
enumeration will be affected in 1980 by some of the social 
changes that we see around us. Housewives were the 
backbone of the enumeration force in earlier times, and 
other housewives usually answered the doorbells they rang. 
Now many of this group have regular jobs; an excellent 
source of census labor has dried up, on the one hand, and, on 
the other, the enumerators have to do much of the job in the 

In addition to there being more people constituting 
one-person households, which have always been harder to 
enumerate than families, people are more mobile, many 
people are suspicious of government, and more are concerned 


about their privacy. There are more laws; hence, more people 
have something to conceal— from lodgers in a zone where 
they are prohibited to workers enjoying unreported incomes. 
Registration of youth for the Armed Forces, proposed 2 
months before the census date, will cause some reticence. 
On the other side, the Census Bureau is applying new tech- 
niques and spending much more money than it did. One can 
only hope that this will offset the greater difficulties. 


The unavoidable arbitrariness of the census does not end 
with the definitions as specified in the instructions nor with 
the (one suspects highly variable) interpretations of these by 
respondents and enumerators. It continues through the 
processing of census schedules. In 1970, some 4.5 million 
individuals for whose existence there was more or less 
evidence, but for whom no information on characteristics 
was reported, had to be incorporated in the tabulations by 
some kind of calculation. The convenient way of doing this, 
which has been used since 1960, is to duplicate the record 
for the last person enumerated whose characteristics were 
reported. Thus, the Bureau of the Census takes advantage of 
any homogeneity of local areas in respect of income and 
other features, since residential segregation of many kinds is 
still a fact. When data are missing for a number of 
consecutive individuals, the computer searches beyond the 
nearest person completely enumerated; the Bureau's rules do 
not permit duplication of one individual more than three 

Such duplication of characteristics is preferable to retain- 
ing "not-stated" entries through the tabulation to the 
published volumes. As was pointed out as early as the 1940's 
by Deming, Hansen, and others, not-stated entries are both 
costly to the printing and provide very little information. 
The user is saved trouble, the printing bill is reduced, and 
accuracy is served if the nearest known case is duplicated. 
Aside from the 4.5 million, all of whose characteristics were 
obtained in this way, many others lacked information on one 
or a few questions. Income was often omitted, and since it is 
important in many allocation formulas, it was duplicated for 
some 1 5 percent of cases. 

Yet few users of census data need to refer to these fine 
points, which affect the margins of the count, not its core. 
Think of the typical questions asked of the census: Are 
people marrying younger than they did? Do the rich have 
fewer children than the poor? Which service occupations are 
declining, which increasing, as we move into the postindus- 
trial society? Enumeration error, or the arbitrariness of 
definitions, matters hardly at all for these. Error is tolerable, 
and it would be wasteful to enumerate everyone; the error of 
a 5-percent sample is perfectly acceptable. 

None of this latitude is so readily accepted when the 
census is used for allocation. This article, having no sug- 
gestions for taking a perfect census, will explore the ways of 
living with imperfection. 


There is a clear criterion for how the census ought to be 
carried out in view of its use for allocation. Insofar as people 
constitute costs for a jurisdiction, that is where they should 
be counted. When Congress allocates funds for a school 
breakfast program, its intention is best served if the funds are 
given to jurisdictions according to the number of their school 
children poor enough to need the breakfast. Funds for 
supplementary benefits to the unemployed ought to be 
distributed in proportion to the number of long-term 
unemployed, etc. One need only say this to realize what a 
crude instrument any allocation formula must be. When 
Congress apportions funds according to population and a few 
other measurable variables, deviations from the target of 
hundreds of percent are easily possible. Based on the 
formula, a city may get an allowance for something it does 
not need at all. This does not mean that the 100 or so 
Federal laws covering support for education, health, trans- 
portation, housing, manpower, and other programs [2] 
are misconceived; they are necessarily targeted by some de- 
terminate simple formula that is only more or less related to 

It fortunately happens that a series of censuses provides 
more information than any one census. If we want to know 
how many people were present in the United States in 1980, 
we must make use not only of the 1980 census, but also of 
those of 1970, 1960, and earlier. 


It is not true that the most up-to-date census of a series 
provides all the information available on the population at a 
given time. There is indeed uncertainty that increases year by 
year because of uncertainty about how many persons died or 
emigrated and how many were added by birth and immigra- 
tion. To infer how many people are here now, using the 1970 
census, exposes us to errors in the components of change, 
and the same errors would make the 1960 and 1950 censuses 
less useful yet. 

Yet these errors of projection can be less than the known 
and persistent differences in the completeness of enu- 
meration at different ages. If we know that children 10 to 14 
years of age at last birthday are more precisely counted than 
people 20 to 24, then it could be better to estimate the 20- to 
24-year-olds in 1980 not from the 1980 census, but from the 
1970 census, with adjustment for migration and deaths. If the 
additional error of the ages 20 to 24 enumeration is much 
greater than the error of adjustment by the components, 
then we should estimate the number of those 20 to 24 in 
1980 entirely from the 1970 census and disregard the 1980 
count. The ages 10 to 14 cohort in 1970 can be supple- 
mented by births of 1956-60, which projects to a 1980 figure 
for the 20 to 24 age group that may or may not be more 
accurate than the projection of the 1970 census. 


This is one of the ways in which we will know the 
completeness of the 1980 or any other census that is part of 
a series. As a cohort ages, it passes through successive 
censuses, being enumerated more completely in some than in 
others. In the case of U.S. blacks, enumeration is grossly 
incomplete under age 5 and again at ages 20 to 65; between 
ages 5 and 20 the count is relatively complete. The method 
does not allow for persons enumerated twice; it assumes that 
the census at which the cohort is counted highest is the most 

Births have been more or less completely registered since 
the 1930's, so for the people born since then, we need not 
depend on censuses at all. The older population is almost 
completely registered for Medicare, so they also are known in 
total, independently of the censuses. From these and other 
sources the overall completeness of the census can be cal- 

Let us suppose that the uncertainty due to errors in 
migration, etc., increases at the rate of 0.2 percent per year, 
so that at the end of a decade a range of 2 percent is added 
to our ignorance of the cohort; at the end of 2 decades, a 
range of 4 percent, etc. Given that ages 10 to 14 in 1960 were 
only 4.4 percent short, adding an error of 2 percent would 
still be better than using the 1970 enumeration, which at 
ages 20 to 24 seems to have been 8.5 percent short. Even the 
census of 20 years ago would at this rate be better than the 
current census as far as age interval 25 to 34 years is 
concerned, it having shown 12.5 percent incompleteness in 
1970. (Figures for blacks reflect Census Bureau estimates.) 

If we could think of the count as a random variable, and if 
errors of updating the cohort were also random, then the 
right way to estimate the number of a cohort now, say the 
number of persons 40 to 45 years, would be to take all the 
preceding censuses and average their projected numbers, with 
smaller and smaller weights going back in time. Unfor- 
tunately, this procedure applies in a clearcut way only to the 
United States as a whole. For States, the unknown migration 
may be large enough to make previous censuses obsolete. Yet 
something has been done in the Bureau of the Census for 


Through the work of J.S. Siegel and his associates [6] , 
based on methods such as those described above, we 
have some indication of the amount and distribution 
of the undercount. Presumably the smallest areas are sub- 
ject to the largest fraction of undercount. This is not easy 
to demonstrate in detail, since the smaller the area, the less 
sure we are of what the undercount is, but one item of 
evidence is obtained by comparing the distribution of $1 
billion according to the census as enumerated and according 
to the census as adjusted by the basic synthetic method, i.e., 
by multiplying each State by the correction for the Nation as 
a whole, recognizing groups of age, sex, and race. It turns out 

that the adjustment for the South Atlantic States as a whole 
is +0.5 percent, i.e., the group would obtain 0.5 percent 
more on the adjusted than on the unadjusted count. But the 
individual States range from -0.5 (West Virginia) to +4.1 
(District of Columbia) in percent difference. Similar results 
are found on comparing the total for the other regions with 
the individual States [6] . 

Only five States (including the District of Columbia) had 
an adjustment of more than 1 percent, but four of these (all 
but Hawaii) would have gained by an adjustment. My 
hypothesis is that, in general, the gains through adjustment 
are more concentrated (by area, race, and in other respects) 
than the losses. The concentration of positive effects of 
adjustment (i.e., of relative undercount) would cause the 
States that expect to gain by adjustment to complain strongly 
and leave the ones that lose indifferent. The result of this 
could well be a strong push for adjustment and only weak 
resistance to it. When a fixed sum is to be divided there is 
no net gain or loss to the whole country, but nonetheless 
the net drive to adjust can be strong. 

One might develop a statistical-political model of the 
situation where there are many small losers by an adjustment 
and a few large gainers. If the drive to adjust is convex below, 
so that the State (or other jurisdiction) with twice as much 
to gain exercises more than twice the pressure for adjust- 
ment, then the net pressure will be in favor. Such a model 
needs study by anyone trying to forecast the balance of 
pressures on the census. 


The concentration of the undercount in certain age 
categories is in one sense an advantage in that it allows a 
series of censuses to provide a more exact total than any one; 
so, it can be used to adjust the current census. The 
concentration in region and race groups is wholly a disadvan- 
tage in that it produced a distribution of Federal funds 
different from what Congress intended. Often it is the very 
groups Congress aimed most clearly at helping that are 
understated. The problem is to rectify the apportionment 
but not to overturn the census in so doing. 

The argument of this paper is that unanticipated random 
error has to be accepted. If to the true apportionment there 
were added a random component of expected value zero, this 
would be no worse for States than the sort of random 
variation that any jurisdiction is subject to from many 
causes. It would be comparable with bad growing weather in 
Kansas, or having one of its sons become President and thus 
producing a tourist boom for Plains, Ga. Random events that 
bring income or force expenditure are not, in general, thought 
to require Federal compensation. 

It is nonrandom variation known in advance that arouses 
just resentment. It seems only fair that the census results 
should be adjusted in the simplest possible way that will 
achieve the intentions of Congress. One such way is to 


increase not the census figure but the payment in accord 
with the relative understatement. If the net 1980 undercount 
is estimated at 1.9 percent of the enumerated for whites and 
7.7 percent for blacks, the payment for blacks would be 
raised by 7.7 — 1.9= 5.8 percent. A municipality would be 
given a bonus of 5.8 percent for its blacks to offset relative 
census incompleteness or whatever corresponding figure was 
appropriate for 1980. We know less about the understate- 
ment of the Spanish population, and the same bonus could 
be given to it. 

Such an adjustment has the advantage of unpretentious- 
ness— no one would make the mistake of thinking it 
produced correct figures in each locality. The net adjustment 
for about 45 of the States would be less than 1 percent. It 
would not reduce the incentive to complete enumeration in 
any one jurisdiction. One hopes it would be temporary, and 
the Census Bureau would attain sensibly equal completeness 
for the races by 1990 or 2000. Even if it never eliminates the 
undercount overall, it could eliminate the inequity arising 
from differential undercount. 


A related aspect of our theme is the logic of negotiation, 
to which game theory is applicable. If A is willing to sell his 
house for $80,000 or more and B is willing to buy for 
anything up to $100,000, then without further data, if these 
two are the only participants in the purchase and sale, the 
price is indeterminate in a range of $20,000. Negotiation is 
likely to be casual to the point where $5,000 or $10,000 
alteration in the price can be conceded on an impulse by 
either party. It is possible that A proposes $95,000 and B 
proposes $85,000 and they genially split the difference at 

Once they have signed up for $90,000, further alterations 
will not be made casually. The seller wants to take away a 
light fixture worth $35 because it would go well in his new 
home; the buyer sternly points out that light fixtures were 
implicitly sold as part of the house in the deal that both 
agreed to. 

In application to our census undercount problem, the 
parties are far more likely to come to agreement before the 
census is taken than after the count is known. To leave 
conditions open in any important respect is like leaving 
unspecified elements in a contract— it makes agreement very 
difficult and leads to highly unprofitable litigation. The 
sooner all participants develop expectations consistent with 
what is going to happen, the fewer the occasions of 
disagreement. The Census Bureau has been acting in accord 
with this principle in informing local authorities of its 
procedures in the utmost detail; perhaps it could go further. 


Once the essential matter is settled— that payments will be 
made in proportion to population— negotiation shifts to 

marginal variations in the definition and count of popula- 
tions. An incentive is offered to seek out any part of the 
census procedure that could be made to seem wrong, an 
incentive amounting to millions of dollars in the case of a 
medium-sized city, to hundreds of millions in the case of 
New York. The dynamics here could destroy the best census 
in the world. All possible census procedures contain arbitrary 
elements, and it is in the nature of census taking that no 
census can stand up to such partisan examination. 

Every piece of the legislation we are concerned with has a 
target, and it should have a convention that will provide a 
reasonable approximation to the target. That a convention 
can be accepted that everyone knows is only a rough 
approximation to the target is shown by the 200-year history 
of apportionment in the House of Representatives. Alabama 
was actually given 7 representatives on the basis of the 1970 
census, but census figures corrected by methods beyond 
challenge, using the previous census and other data, would 
entitle it to 8; and California would give up one of its 43 
[6] . After most censuses, at least one State turns out to be 
so deprived because the convention (in this case the census 
count as published by the Bureau) is not quite on target. Yet 
no one says that democracy is frustrated by California having 
one seat too many and Alabama one too few. 


On apportionment of funds, the problem before us is to 
choose among conventions. The choice involves a conflict 
between determinacy and precision that may be illustrated 
by the way postcensal adjustments can be made. 

One convention is to take the published count at the last 
preceding census and stay with it for 10 years. This has the 
advantage that each jurisdiction would know at the start 
what it was to get and could make plans for disbursement. 
But some jurisdictions might protest that they had been 
growing and were likely to grow in the future, so the "last 
census" formula would give them less than Congress intended 
them to have. The point might be met by saying that every 
jurisdiction would get the straight-line projection of its 
population from the last two censuses. This would have the 
same advantage of being immediately known and calculable 
and does not discriminate against the growing parts of the 
country. The future is sufficiently unknown that this does 
not discriminate against anyone in an obvious way. 

Intercensal estimates are not the subject of this con- 
ference, and I cite them only to bring out some of the issues, 
in particular to help see the choice to be made between 
accord with the objective of the legislation and accuracy. For 
clearly anyone can do better than the straight-line pro- 
jections with such supplementary material as building 
permits, city directories, or local counts made after the last 
census. But these would depend on how the person making 
them chose among various items of ancillary data. Is a 
straight-line extrapolation, which gives exactly the same 


number irrespective of who makes it, preferable to a pro- 
fessional estimate that depends on someone's decision to 
trust building permits rather than a city directory? The 
user who wants determinacy, simplicity, and objectivity 
above all will take the first; the one who wants precision and 
fidelity to the object of the legislation will take the second. 
The choice of intercensal estimates is analogous to the choice 
on the number to accept for the census itself. 


To adjust the census or not to adjust it is far from a 
symmetrical choice. The unadjusted result that comes out of 
the standard and traditional Census Bureau procedures is a 
single possibility; adjustment is many possibilities. Once 
adjustment is decided on, it would have to be decided further 
whether (a) to use an objective and uniform, though 
admittedly inaccurate, synthetic method, or (b) a subjective 
method that would give greater precision. If the former, 
whether the base figure would be the whole of the United 
States or individual regions. Figure 1 shows part of the 
decision tree that would be required if there is to be 

It might be decided to match with the postenumeration 
survey (PES), find the people included in the PES that were 
missed in the census, identify their characteristics, and cor- 
rect the count for each jurisdiction by regression. But the 
size of the PES sample would not suffice to provide data for 
individual jurisdictions. Hence, the ratios or regressions 
would have to be obtained for some larger area, perhaps a 
State or region, and applied to all individual jurisdictions 
within the larger area; the procedure has come to be called 
synthetic. To decide how large the area should be from 
which the coefficients would be obtained necessitates a 
tradeoff between sampling error (which grows greater as one 
attempts to derive coefficients from smaller areas) and ap- 
propriateness of the coefficients (which grow greater as the 
area from which they are derived becomes smaller). Beyond 
this is the question of what characteristics shall be recognized 
in the regressions: Would one go beyond race to age and sex, 
and beyond that to income, size of dwelling, etc.? The de- 
cision diagram, as drawn up here, contains only a small part 
of the choices that would have to be made once we start 
down the road of adjustment. 

Beyond the decisions indicated in the tree are other 
choices. Would the census be adjusted for all purposes down 
to the finest tabulations? This could be done consistently 
very easily by computer. For example, if it were decided that 
each black enumerated was to count for 1.06, then that 
could be incorporated individual by individual in program- 
ming the tabulations, which would then come out consistent 
with one another. (The alternative of multiplying up the 
finished tables to adjust to the new total has the disadvantage 
that different cross-tabulations would not be consistent with 
one another.) 

Such a diagram provides a framework within which each 
person can find his or her preference; that is, both the good 
and the bad of it. In a sense, it is guilty of complicating the 
task before this conference by suggesting different numbers 
for different users. Some will want a straightforward method 
that anyone can verify from published census figures on his 
own scratch pad; others will seek every last gain in accuracy, 
even at the cost (however great) of complexity. 


What is the role of knowledge in all this? Any trained 
demographer can use outside information to ascertain that 
the census is short and can produce some kind of correction. 
It is little more trouble to do this separately for whites and 
nonwhites. Such calculations that would use births, social 
security, Medicare, and other noncensus data, according to 
the judgment of the particular demographer, would produce 
a result better than the census— 1- or 2- or 3-percent 
incompleteness with the material of 1980 might be shown. 
The cost of this would be a few thousand dollars worth of a 
demographer's time. 

But what about the Hispanics? Unlike whites and 
nonwhites, Hispanics cannot be identified in consistent 
fashion on past birth and death certificates, and this makes 
the extra census calculation much more elusive. The Census 
Bureau [7] has tried to see what it could make of aggre- 
gate statistics; its resulting bulletin does not lay claim to 
any dazzling result. If the matter is important enough, 
one could resort to a sample in the field, but then the cost 
rises from thousands of dollars to millions. One would have 
to draw a sample of the population (presumably with 
heavier sampling ratios in areas where Hispanics were con- 
centrated) and then make a name-by-name check to the 
census to see what fraction was missed. When this was done 
with the population as a whole for 1970, it came up with 
a fraction missed less than the demographic calculation using 
outside sources. For the Hispanics, there might be greater 
difficulties than for other elements— for instance, native 
white— so we can be sure that the undercount as estimated 
by an independent survey and matching would be low. It is 
not inconceivable that 5 percent of Hispanics would really 
be missed in the census, and the fresh enumeration plus 
matching process would discover an undercount of only 3 

All this is at the national level. For individual jurisdic- 
tions, some kind of synthetic method is inevitable— using 
either the undercount ratio for the United States as a whole 
or that for a region or State. Some experimenting has been 
done on the error of a synthetic estimate and more would be 
useful. Savage [5] outlines an extensive research program, 
and clearly we ought to get all the knowledge we can of the 
effects of various proposed methods of adjustment. But work 
on this would share a feature common to all research; no 
one can be sure in advance that it will succeed and produce 
usable results. 


No adjustment 




What is judged 

most precise 

for each 




U.S. as 




State as 




Census Bureau 


to other 



race, age, 

and sex 




















Figure 1 . A Partial Decision Tree for the 1980 Census Showing the Asymmetry 
Between "No Adjustment" and "Adjustment" 



The problem of the undercount is part of a wider 
problem: At its margins the 1980 census of the United 
States, like any census anywhere, is arbitrary. This is not in 
the sense that the census-taking agency makes up the figures, 
which it does not; it follows very strict and carefully worked 
out procedures. It is arbitrary in the sense that other 
procedures could be devised that would produce slightly 
different results, yet be technically defensible. The effects of 
the decisions made at the margin are numerically small, and 
they would hardly affect most of the uses of the census, 
which are concerned with providing information. There is 
one use that they do affect— the allocation of funds. For 
information purposes, a 1 -percent variation is negligible; for 
allocation of funds, it means millions of dollars. The problem 
is a new one to statisticians and demographers, and the books 
do not even discuss it, let alone provide answers. 

This applies to the decisions that determine the published 
count; for example, putting college students into the area 
where they live while at college rather than where their 
parents live and the opposite for students in secondary resi- 
dential schools. It applies even more to any allowance for the 
undercount that would be made on top of the enumeration. 

Because of this variability, the distribution of funds is 
possible only on the basis of an agreed-on convention. One 
way of securing agreement among all parties is by discussion 
in advance of the census, before the numbers that come out 
of the several ways of taking the census are known. All that 
is needed for such agreement is an unbiased expected value. 
After the count, when it is known that New York will gei 
more on one reasonable adjustment or New Mexico on 
another, voluntary agreement is impossible, and a solution 
has to be imposed. It could be imposed by the legislature or 
by some public body of representative citizens approved by 
the legislature. It could be imposed by the Secretary of 
Commerce without such a body, using powers delegated by 


In the establishment or a convention, both political and 

technical considerations enter. The Bureau of the Census has 

pioneered in the improvement of its own figures, and its 

statisticians are thoroughly aware of the properties of various 

proposed ways of adjusting for the undercount. No agency of 

government contains more knowledgeable, skilled, and 

devoted technicians than the Census Bureau. Its methods of 

survey and sampling have been adopted throughout the 

world. But it would be wrong to ask the Bureau to go 

beyond technical decisions to political ones, and we have 

here a problem for which technical considerations do not 

provide a unique answer. 

There are several distinct ways to adjust the census, any 

one of which will, on the average, come closer to the actual 

number of people present, jurisdiction by jurisdiction, than 

the census as enumerated. Each could be the basis of a 

convention. The several possible conventions fall into three 


1 . Accepting the census as enumerated! 

2. Adjusting the census count by a simple objective 
method that anyone can apply to all jurisdictions and 
obtain a unique answer. 

3. Using all existing data and doing the best one can, 
jurisdiction by jurisdiction, and foregoing the unique- 
ness and objectivity of convention 2. 

1. The Census as Enumerated. The first of these has been 
the convention of the past. The census number is the count 
of all those persons of whose individual existence the Bureau, 
using its standard procedures, has evidence. Like the gold 
standard, it is a superstition, perhaps, but one that avoids 
inflation and confusion. As used in the past, it has been 
sustained by the courts and prevents any suspicion that the 
Bureau of the Census is fudging its figures. The uncounted 
are those who were not found with the effort mounted by 
skilled census takers operating with a given finite budget. On 
this convention, those not found are treated as not present; 
the only argument that could arise would concern whether 
the census effort is large enough or whether its budget should 
be increased. 

2. A Simple Objective Adjustment. The simplest example 
of the second, a uniform method that would take out the 
major differential of completeness for minorities, is to give 
those minorities in each jurisdiction a bonus equal to their 
relative undercount for the country as a whole. If this had 
been applied in 1970, each black would have been counted at 
1.06 in reckoning the population of jurisdictions for revenue 
sharing and other purposes. Failing any way of making 
similar calculation for Hispanics, they too would have been 
given a bonus of the same 6 percent. Other minorities would 
have remained unadjusted. No one has any idea of what 1980 
number will correspond to the 6 percent for 1970, and we 
will have to wait 2 or 3 years to find out. It may be less than 
6 percent, because the census effort is greater; it may be 
more, because new problems have arisen. 

While this adjustment removes the bias for the country 
as a whole, it by no means ensures accuracy in indivi- 
dual jurisdictions [8] . It is a synthetic method, in the 
sense that it applies a ratio derived from one area— the United 
States as a whole— to another area— the particular juris- 
diction. Possibly it can be improved by taking the ratio not 
from the United States as a whole, but from the particular 
region or State. Thus, it would seem more appropriate to use 
the undercount of New York State in adjusting New York 
City than the undercount for the country as a whole. The 
trouble is that undercount for New York State is not as 
accurately known as that for the United States. The 
calculation depends on estimates of migration, and the 
migration among States is unrecorded, while that for the 
United States as a whole is at least in large part recorded, so 
the undercount ratio will always be better known for the 
country than for each State. The decision that is preferable 
cannot be made without extensive experimenting, and the 


Census Bureau is in a better position to do such experi- 
menting than any other body. 

There are still further possibilities of improving the 
estimates for jurisdictions on an objective basis, and with a 
method that would be applied uniformly to all 39,000 
jurisdictions. That is. carry out a postenumeration survey 
of a kind used in 1970, but larger, in which a sample of 
areas containing some hundreds of thousands of people 
would be enumerated a second time, independently of the 
census, but very close in time to April 1, 1980. The 
enumerators who would do the PES would be selected 
among the very best of those hired for the census; they 
would be given further training; they would be better paid 
and allowed more time to search for people, both in 
apparently vacant houses and houses known to be occupied. 
They would certainly do a better job than the original census 
takers [1] . 

The PES areas would then be compared name by name 
with the census. If the PES were perfectly complete, and if 
the matching of names could be carried out exactly, then one 
would know not only what fraction of blacks and Hispanics 
were omitted, but also omissions of people of the several 
ages, occupations, income groups, etc. These omission rates 
could then be applied to jurisdictions. The method would 
still be synthetic, in that the omission rates applied to any 
given jurisdiction would not be obtained from that juris- 
diction, but would have to be derived from some larger 
area like an entire State or region. With the assumptions 
above, there would be a clear gain in accuracy over using the 

undercount as calculated without the PES. 

Unfortunately, the assumptions are very strong. The PES 
enumerators do better, but their work is not perfect. And the 
matching of people they turned up with those appearing in 
the census is subject to considerable error, whether done by 
hand or by machine. Names are misspelled and people move 
even in the few days that elapse between the census and the 
PES. Most persons are properly enumerated, and the search is 
for the 2 or 3 percent missed. If there is doubt on the 
matching of 5 or 10 percent of the names, the undercount is 
lost amid the matching errors. Once again experimenting will 
be needed to see if the PES can provide better ratios than the 
calculation using births. Medicare registrants, and previous 

With the knowledge now in existence, one cannot decide 
which of the above methods will give the best results for 
individual jurisdictions, but we can say where the burden of 
proof lies. We know that the simplest method of taking out 
the bias (say, multiplying blacks and Hispanics by 1.06) will 
improve the census to some degree. Only more complex 
methods as will demonstrably improve on that improvement 
should be considered. 

3. Using All Existing Data and Trusting to Judgment. All 

of the above concern the second approach, in which an 
objective method is agreed on or legislated and applied in a 
perfectly mechanical fashion to all jurisdictions. The third 

kind of convention would put its trust in one agency, either 
the Bureau of the Census itself or some other prestigious 
technical group. That body would do the best it could for 
each of the 39,000 jurisdictions. In some areas, it might have 
good building permits that would enable it to detect omitted 
new dwellings, and it would adjust for these. In other areas, 
there might be a local census that the technical group judged 
reliable; it would turn to that. There are many other ways in 
which it could use judgment on what ancillary data to bring 
in, and the knowledge and experience of its personnel would 
be such that it would surely obtain a better result than 
applying a uniform method throughout. 

But judgment is the key word here. Others would judge 
differently, and given the partisanship that the distribution 
of funds creates, the only way of getting a unique and 
workable result out of this third convention is to agree that 
the population of each of the jurisdictions is what the 
appointed technical group says it is. Its choice of ancillary 
data could not be opened to criticism or review by the 
jurisdictions concerned, because they could easily come up 
with a result more favorable to themselves. If one wants the 
kind of improvement in accuracy that this way of doing 
things can provide, one has to forego the right to check the 
technical group. Those who insist on checking the judgment 
of the technical group against their own judgment are giving 
up the possibility of a unique solution. They should 
properly revert to the second type of convention, where the 
calculation is simple and perfectly objective. 


The main concern is the integrity of the census. As long as 
that is preserved, we can live with any of the methods. Even 
if it were decided that convention 1 was to be followed, the 
minority groups need not suffer. Congress, noting that the 
undercount handicaps them, could merely change the for- 
mulas used on the several grants. It could, for example, add 
to what each jurisdiction receives an amount equal to 6 
percent times its fraction of blacks. Convention 2 is objective 
and open to verification by those concerned. Convention 3 
would give a result closer to the actual population, but would 
require that everyone involved trust the technical group 
(Census Bureau or other) that did the work and forego the 
right of review. Each has its advantage. 

What will be fatal is indecision as among the methods. 
Since for any given jurisdiction they give different numerical 
results, and none is in any sense "right," indecision will be an 
invitation to each jurisdiction to press its own way of 
calculating the undercount. It will have recourse to the 
courts, which do not know the true population any more 
than the rest of us do, and which could well come up with 
different solutions in different cases and so establish mutu- 
ally contradictory precedents. The reputation of the Bureau 
of the Census could hardly stand up against the host of 
pressures. In effect, millions of dollars in reward would be 


offered to a jurisdiction that could prove the census wrong. 
No institution has a reputation solid enough to stand against 
that kind of incentive system. 

How to avoid the threatened confusion and danger to the 
Census Bureau's integrity ought to be a preoccupation of 
citizens and of Congress. It can be protected by dividing the 
task between the part that is technical and the part that is 
political and assigning each to a competent body. An answer 
that has been suggested, and with which I am thoroughly in 
agreement, would involve just two steps: 

1 . Ask the Census Bureau to set down the options for 
adjustment. It would be able to describe technically 
respectable ways, falling under the three heads above, 
in much more detail and with much more knowledge 
than this writer can furnish. It would assure the 
authorities that any of the methods of adjustment it 
set down would provide estimates for jurisdictions that 
on the average (not necessarily in each separate 
jurisdiction) would come closer to the true population 
than the unadjusted 1980 figure. It could easily offer 
more than one method falling under convention 2 


2. Either the legislature or some other representative (i.e., 

political) body, not primarily consisting of technicians, 
but, of course, able to consult them, would determine 
which of the methods listed under (1) would be used. 
Alternatively, it might decide on conventions 1 (no 
adjustment) or 3 (entrusting the whole matter to the 
Census Bureau or some other technical body, without 
the privilege of criticizing its resulting numbers). 

This seems the most orderly way to take advantage of the 
technical skill of the Bureau of the Census, and to avoid 
burdening it with political responsibility. 


1. Bateman, V., and Cowan, Charles D. "Plans for 1980 
Census Coverage Evaluation." Unpublished paper pre- 
sented at August meetings of the American Statistical 
Association, 1979. 

2. Gonzalez, Maria Elena. "Characteristics of Formulas and 
Data Used in the Allocation of Federal Funds." Forth- 
coming in The American Statistician. 

3. Keyfitz, Nathan. "Information and Allocation: Two Uses 
of the 1980 Census." The American Statistician 33, 2 

4. National Research Council. Counting the People in 1980: 
An Appraisal of Census Plans. Washington, D.C.: Na- 
tional Academy of Sciences, 1978. 

5. Savage, Richard. Letter to Conrad Taeuber, dated Feb. 28, 

6. U.S. Department of Commerce, Bureau of the Census. 
"Coverage of Population in the 1970 Census and Some 
Implications for Public Programs. Current Population 
Reports, Series P-23, No. 56. Washington, D.C.: U.S. 
Government Printing Office, 1975. 

7. U.S. Department of Commerce, Bureau of the Census. 
"Coverage of the Hispanic Population of the United States 
in the 1970 Census: A Methodological Analysis," Current 
Population Reports, Series P-23, No. 82. Washington, 
D.C.: U.S. Government Printing Office, 1979. 

8. U.S. Department of Health, Education, and Welfare. 
National Center for Health Statistics. "Small Area Esti- 
mation: An Empirical Comparison of Conventional and 
Synthetic Estimators for States." Vital and Health Sta- 
tistics, Series 2, No. 82. 1979. 

Adjusting for Decennial Census Undercount: 
An Environmental Impact Statement 

Peter K. Francese 

American Demographics 


For more than half a century, the Bureau of the Census 
has created and shaped a statistical environment. The 
decennial census is the largest, most expensive, and from the 
public's point of view, the most visible part of that 
environment. Like the physical environment, the census 
touches the life of each American and more than just once 
every 10 years. Federal and State fund allocations and 
corporate expenditures are influenced daily by census results. 
Any significant change in the way the census is taken or 
tabulated will affect the political, legal, institutional, and 
financial environment of private and public activities. There 
can be no doubt that the planned adjustment of decennial 
census figures for the lack of a 100-percent count is a 
significant change from present policy that will have 
far-reaching and long-lasting consequences. 

Since the passage of the National Environmental Policy 
Act of 1969, any project or action that may affect the 
environment requires an environmental impact statement. 
The drafters of that statute probably were thinking more 
about river-course adjustment than statistical adjustment, but 
the manner in which vast sums of money are spent through 
the use of census statistics affects us just as directly as a 
diverted river. The 1969 act recognizes this when it states 
that the Federal Government shall improve and coordinate 
its plans and programs so as to "achieve a balance between 
population and resource use which will permit high standards 
of living and a wide sharing of life's amenities." 

An environmental impact statement requires a detailed 
description of the following: 

1. The proposed action 

2. The present environment 

3. The expected impact of the proposed action on the 
present environment 

The description of the proposed action details its purpose, 
nature, and extent, as well as its timing and methods of 
execution. The description of the present condition includes 
sufficient detailto discover, insofar as possible before any 
action, all life forms or other things of a unique and 
irreplaceable nature that might be affected by the proposed 
action. In addition to a general discussion of the impact of 
the proposed action on the environment, the third part of 
the statement includes details of: 

a. Any unavoidable adverse effects 

b. Any irreversible commitment of resources 

c. The possible impact on long-term use or productivity 

d. Any mitigating measures that might be taken 

e. Any alternatives to the proposed action 

The main part of this paper will follow the format outlined 
above in an effort to determine the impact of adjusting for 
census undercount. 


In June of 1979, Senator Daniel P. Moynihan introduced a 
bill (S.B. 1606) that would have amended title 13 of the U.S. 
Code to say "In conducting that census in 1980 the 
Secretary shall adjust the population figures, employing the 
best available methodology to correct for undercounting. 
These adjusted population figures shall be used by every 
Federal officer or employee administering a program under 
which funds or benefits are allocated or distributed among 
States or other units of government on the basis of 

The Panel on Decennial Census Plans, convened by the 
National Academy of Sciences in 1977, concluded in 
recommendation 23 that "inequities resulting from the 
geographic differentials in the decennial census undercount 
could be reduced by adjustment of the data for 
underenumeration. Methods of adjustment with tolerable 
accuracy are feasible. While the application of these methods 
has some arbitrary features and while the figures for some 
areas would not be made closer to the correct distribution of 
population, the panel believes that on balance an 
improvement in equity would be achieved." The panel goes 
on to say that "If the Secretary of Commerce agrees with the 
panel's conclusion" the Census Bureau should be directed by 
the Secretary of Commerce to "adjust for underenumeration 
the counts for total population of the United States, the 
States, and local areas, for use in distributing funds. The 
adjustments would not be applied to the counts used for 
legislative apportionment nor to the body of census data on 
the characteristics of the population." 

These are only two of a number of proposals for 
undercount adjustments, but they reflect the general 
consensus on the purpose and limitations of such adjustment. 
The purpose is to foster greater equity in the distribution of 
Federal funds. Prior to the widespread use of census statistics 
for allocating funds, undercount adjustment was hardly ever 
an issue. But, during the next decade, the amount of 



money— billions of dollars— to be allocated to State and local 
governments will be large enough for them to be asking, "Are 
we getting our fair share?" 

The two limitations most generally agreed upon are that 
the adjusted figures would not be used for reapportionment 
and that adjustments would be made only to the total 
population for State and local governmental entities. The 
first limitation is imposed primarily because of time 
constraints. It would be technically impossible to produce 
estimates of the undercount in time to give the President 
reapportionment figures by January 1, 1981. Besides, 
population data for city blocks (an area too small for any 
adjustment procedure now considered) are used to draw 
congressional district boundaries. The second limitation 
comes about because many of the other items of data from 
the census (such as income) suffer more from underreporting 
than undercount. 

It will be assumed here that adjustment would be limited 
to the total population, but that characteristics such as age, 
race, sex, and location (i.e., urban versus rural) would be 
taken into consideration in the adjustment procedure. The 
geographic areas to be considered for adjustment are States, 
SMSA's, and revenue-sharing jurisdictions. The Bureau of the 
Census current plans for 1980 census coverage evaluation 
will produce estimates of undercount for the Nation, regions, 
divisions, States (including the District of Columbia), and 
probably the 20 largest SMSA's. Estimates of undercount for 
the smaller revenue-sharing areas would require the use of a 
different method than the one used for large areas. Most 
likely it would be a synthetic- or regression-estimation 

The methods currently being contemplated for estimating 
census undercount for States and other large areas are 
demographic analysis and a postcensus sample survey known 
as the postenumeration survey (PES). Demographic analysis 
has already begun. It involves analyzing pastcensus data and 
administrative records to develop an expected population. 
The PES will be an independent sample survey of some 
250,000 households to begin shortly after the census. The 
fieldwork for the survey is expected to take about 6 months. 
Processing the survey results and matching the results with 
administrative records to produce accurate State-level 
undercount estimates is expected to take until the fall of 
1983 or the beginning of 1984. Since additional work would 
have to be done to actually calculate the undercount 
adjustments and apply them to each geographic unit, it seems 
likely that the adjusted population figures would be available 
around April 1984. It is assumed here that all of the 
statistical work required for undercount adjustment would 
be done by the Bureau of the Census according to the 
methods described in the paper by Bateman and Cowan. [1] 
Minor changes in the method by which the undercount is 
estimated are not expected to significantly alter the 
consequences of the undercount adjustment. 

In summary, the key points of the proposed actions are: 

1. Population totals from the 1980 census would be 
adjusted for an estimated undercount for the entire 
United States, the States, the SMSA's, and 
approximately 39,000 local governments that receive 
congressional reapportionment. 

2. Adjusted population figures would be used for 
allocations of Federal funds but not for 

3. The estimation of the undercount and associated 
statistical work would be done by the Bureau of the 
Census, using a large postcensus survey and 
demographic analysis; this work is expected to take 
about 3 years from the date of the census. 

4. Only population totals would be adjusted for the 
undercount, but variables such as age, race, and 
location would be considered in the adjustment 
procedure to account for differentials in the 


Attitudes Toward the Census 

Aside from undercount adjustment procedures, the 1980 
census will be conducted and the figures tabulated as in 1970, 
but with few significant differences. Substantially more 
money ($200 million) has been spent prior to this census 
than before any previous one on a massive effort to improve 
the rate of response. The Census Bureau has obtained the 
cooperation of groups such as the National Assoication of 
Broadcasters, the Boy Scouts of America, the National 
Football League, and others to deliver the message on the 
importance of responding to the census. In addition, the 
Bureau's Community Services Program will encourage 
grass-roots support for the census, and the Local Review 
Program will give municipalities a chance to review 
preliminary census figures. This direct involvement in the 
census by local organizations is expected to encourage 
cooperation as well as increase confidence in the results. 

However, there have also been some changes since 1970 in 
the American public's perception of its Government. It has 
become a cliche to refer to the late 1970's as the 
"post-Watergate" period, but there is no question that the 
credibility of Government officials has been damaged. This, 
in combination with the Government's inability to deal with 
the problems of inflation and energy, has created a mood of 
alienation and hostility. So many public statements by 
Federal officials have turned out to be "inoperative" at some 
later time that any statement, no matter how factual, is 
viewed with suspicion. Thus, if the Director of the Census 
Bureau says, quite accurately, "Your census questionnaire is 
confidential," a not inconsequential fraction of the public 
says, "Who are you trying to fool?" Among professional 
people involved with statistics, the Bureau of the Census is 
viewed as an agency of unquestionable integrity, but much of 


the public makes no distinction between one agency of the 
Federal Government and another. 

In Congress, the Census Bureau has been fair game for 
some time. Prior to the 1970 census it was Congressman 
Betts, in 1977 it was Congressman Lehman. The objective of 
the original Lehman bill was quite plain: to wrest control of 
the population figures away from the Census Bureau. A Wall 
Street Journal editorial on the subject was appropriately 
headlined, "Pork Barrel Census." 

There is no doubt that other attempts will be made to 
control the information that the Census Bureau produces. 
The stakes are high, and the Census Bureau is hardly ready 
for political combat. Except during census time, it has only a 
small number of employees, and most of them are in 
Washington. Its annual budget is an insignificant fraction of 
total Federal outlays, and it lacks a powerful constituency 
such as, for example, the Department of Defense enjoys. Its 
vulnerability, in combination with its great value as the only 
objective "factfinder for the Nation," gives the Census 
Bureau the characteristics of an endangered species. 

Census Tabulations 

The census will, of course, take place around the first of 
April. By January 1, 1981, official population totals will be 
given to the President for determining the number of 
representatives from each State. 

Some time early in 1981 the first set of computer tapes 
will be available. These tapes will contain, in addition to 
population totals, several hundred tabulations of population 
and housing data from the Ri-,ort-form questionnaire. If the 

1970 census is any indication, by the end of 1981 thousands 
of reels of data will have been sold by the Census Bureau, 
summary tape processing centers, State Data Centers, and 
others. In addition to being copied many times, these tapes 
will be manipulated in every conceivable way to produce 
statistics in forms to match users' needs. By the end of 1982, 
all summary tapes and most printed reports will have been 
published and distributed. 

Federal and State Use of the Census 

Census statistics have been used for governmental 
purposes since the first census in 1790, but the widespread 
use of census data for funds allocation and business research 
is a recent phenomenon. At the Federal level, many of the 
very powerful political coalitions that operated in the past 
have virtually disappeared. With the iack of any strong and 
lasting political basis for the distribution of money, the 
Federal Government has turned to using an unbiased 
measuring stick— census statistics. Money is still handed out 
on a "discretionary" basis, but more and more transfers of 
funds from the Federal to the local level are based on census 

Over a hundred Federal programs use census statistics in 
their allocation formulas, but one of the largest and most 
visible programs is revenue sharing. Begun in 1972, it sends 
money directly from the Treasury Department to State and 
local governments based on a formula that uses, among other 
factors, population and per capita income. It is in this 
program that equity was thought to be most affected by an 
undercount. About 39,000 local governments have been 
receiving about $7 billion a year under this program. The 
following table shows the number of revenue-sharing 
recipients in 1976 by population size: 




of areas 

of total 




Over 1 ,000,000 



500,000 to 999,999 . . . 



100,000 to 499,999... 



50,000 to 99,999 



10,000 to 49,999 



2,500 to 9,999 



1,000 to 2,499 



Under 1,000 



We can see that more than two-thirds of all revenue- 
sharing recipients are smaller than an average-size census 
tract, and half of them are about the size of an enumeration 
district. However, there are still over a thousand communities 
of 50,000 or more people receiving these funds, and there 
can be no doubt that many of them have come to depend on 
this money to keep down property taxes. Unless replaced 
with a similar program, revenue sharing is likely to be with us 
for a very long time. 

Many Federal programs use census statistics to distribute 
funds for nonmunicipal areas such as neighborhoods or 
special-benefit districts. The Department of Housing and 
Urban Development has several renewal, rehabilitation, and 
rent-subsidy programs targeted at specific neighborhoods, 
some containing only one or two census tracts. Special 
education funds go to school districts, which seldom follow 
municipal boundaries, and funds for improving water quality 
go to water districts, which may not be units of local 

In addition to funds-allocation uses, virtually all Federal 
agencies use census statistics for basic research, for develop- 
ment of new programs, as well as for regulatory purposes. 
The Office of Federal Contract Compliance Programs and the 
Equal Employment Opportunity Commission make heavy 
use of census statistics by ethnicity, race, and sex to 
determine if there has been discrimination in hiring practices. 

For every use of census data in the Federal Government 
there can be found a parallel use in State governments. State 
and county legislative districts are drawn using census 
population figures, and about 12 States including the two 
largest— California and New York— make extensive use of 


decennial population figures for their own form of revenue 
sharing. These States and all the others also use the data for 
land-use planning and regulatory purposes. For example, a 
town in New York must have a certain population before it 
can set its own highway speed limits and perform other local 
government functions. 

All States except Massachusetts participate rn a 
cooperative program with the Bureau of the Census to 
produce annual estimates of population for each county and 
SMSA within their borders. The decennial census is, of 
course, the benchmark for these estimates. There can be little 
doubt that following the Federal pattern. States will 
continue to make extensive use of these statistics in their 
research, planning, and relationships with their local govern- 
ment Units. / 

Business Use of Census Data 

The use of census statistics has grown substantially in the 
last decade. This increase in reliance on demographic data 
coincides with the maturing of the American economy and 
the pervasive use of computers. Business planners are now 
frequently confronted with nearly saturated markets. In that 
kind of environment, market research becomes an essential 
part of staying in business. In addition, new government 
regulations regarding equal employment opportunities 
require that companies show with census statistics that they 
are not discriminating. 

Fortunately, this increase in the complexity of doing 
business has been accompanied by the widespread and 
relatively inexpensive availability of computer technology 
along with the census data on computer tape. During the last 
decade, several private companies were formed to serve 
business's need for demographic statistics. From almost 
nothing, the demographic data business has grown to the 
point that revenues^are now conservatively estimated at 
between $10 and $20 million annually and are expected at 
least to double by 1983. By American business standards, 
those are not particularly large revenues, but they have a 
substantial multiplier effect. Corporate decisions involving 
billions of dollars of capital expenditures are made on the 
basis of whether there is a large enough consumer market in 
the right place at the right time. Any market researcher or 
corporate planner would agree that the foundation of 
consumer information is the decennial census. It may take a 
specialized survey to find out what kind of people will buy a 
particular product and how much of it, but it is the census 
that reveals how many buyers are out there and where they 

For a variety of reasons, business's use of census statistics 
is not as visible as government's. Because of competition, 
corporate planners are reluctant to divulge which census 
statistics they use and exactly how prominently they figure 
in the decision process. The data they use are frequently 
purchased from a data vendor and not directly from the 
Census Bureau. Also, there are many thousands of corporate 

establishments, each making independent business decisions 
about where to expand, where to contract, what to buy, and 
what to sell. No one could possibly keep track of the 
individual decisions; they only become visible in the 

Without going into case histories, several general state- 
ments can be made about the business use of census 
statistics. First of all, extreme accuracy is not essential. It is 
not realistic to expect a business decision to turn on whether 
the percentage of potential buyers was 36 or 36.7 percent of 
the population, or whether the market size was 1.2 or 1.3 
million. No one can predict actual consumer behavior that 
accurately, and as a result, a substantial margin of error is 
usually assumed for planning purposes. Second, timeliness is 
important. If 1- to 3-year-old children are the market for 
your product, 4-year-old census data are not adequate. 
Businesses will make maximum use of the census in 1981 and 
1982 and rely more on estimates after that. Third, the trend 
is often more important than the present condition. If the 
census shows a market size that is sufficient now, the 
investment decision will be different if that market is 
perceived to be growing rather than declining. Fourth, 
markets for restaurants, stores, branch banks, and similar 
type of establishments rarely, if ever, follow any political 
boundaries. That sort of business uses much more small-area 
data (tracts, enumeration districts, or block groups). 

Finally, for most consumer products and services the 
characteristics of the population are more important than the 
size of the total population. Most consumer products are 
purchased by only a segment of the population, which can be 
described in terms of income, age, type of dwelling unit, etc. 

Both business and government are relying on the census 
more every year. As the marketplace becomes more compli- 
cated and unpredictable, business people will purchase and 
use more statistical data on consumers to assist in making 
business decisions. As the fragmentation of American society 
continues, governments at all levels will turn to census 
statistics to mediate among the many diverse groups, each of 
whom wants its share of benefits. In the midst of these 
competing demands is the Bureau of the Census. Although 
highly regarded for its professionalism and total confi- 
dentiality, the Census Bureau finds itself in an uncomfortable 
situation. The census statistics it produces have become so 
valuable that demands for their accuracy may exceed the 
Census Bureau's ability to obtain them, given the attitude of 
the public today. At the same time, the control of these 
statistics is looked upon by some as the key to the Treasury. 
It is in this climate that proposals for undercount adjust- 
ments have surfaced. 


Design Impact 

The major purpose expressed in undercount adjustment 
proposals is to produce greater equity in the distribution of 


Federal funds. This can therefore be construed as the desired 
impact. In an analogous situation, the desired impact of a 
new highway may be to reduce traffic congestion on local 

The National Academy Panel on Decennial Census Plans 
reports that there are many Federal programs that use 
population or per capita income, and that the largest one of 
these is general revenue sharing. In this program, however, 
income has a greater weight than population; there is a fixed 
amount of money, and there are special constraints that limit 
how much a recipient may gain or lose in funds. The panel 
concludes that, as far as the revenue-sharing program is 
concerned, "adjusting the population count without 
simultaneously adjusting the income data for 
underenumeration (or underreporting of income) could 
result in little or no improvement in the equity of the 
distribution of funds under this program." If the single 
largest formula grant program (accounting for almost 20 
percent of the Federal funds disbursed in this manner) is not 
going to be materially affected by population undercount 
adjustment, then it would appear that the desired impact is 
somewhat muted. Considering the fact that nearly 70 percent 
of the revenue-sharing recipients have a population count 
under 2,500, where any adjustment methodology is least 
certain, the impact may be negligible. 

Many of the other Federal grants from HEW or HUD are 
categorical and for very specific purposes; it is doubtful that 
adjusting population counts would significantly affect the 
equity of funds distribution for those programs. Also, these 
categorical grants generally do not apply to the small 
communities mentioned above. 

However, there are programs at both the State and 
Federal levels in which funds are distributed solely or 
primarily on the basis of population. Here the undercount 
adjustment would certainly have an impact. What cannot be 
determined is the political result of the potential situation in 
which a minority of local governments get more money and a 
majority get less. 

Unavoidable Adverse Impact 

One result of undercount adjustment would be the 
creation of two sets of 1980 census population figures. The 
first set would be the unadjusted figures used for reappor- 
tionment and other purposes for an estimated period of 3 to 
4 years. This first set would be in all published census books 
for all census geographic areas, would be on all census tape 
files, and would be compatible with other census tabulations. 
The second set would be published early in 1984 and be only 
for regions, States, and local governmental units. A 
municipality applying for a HUD neighborhood 
rehabilitation grant would require a directive from HUD 
instructing it as to whether the difference between the 
adjusted and unadjusted figures at the city level should be 
allocated down to each census tract. 

Dr. Keyfitz summarizes the impact of this situation when 
he says, "Keeping two sets of books is as confusing for a 
statistical agency as for a business— there is perpetual 
uncertainty about which set to use for each application that 
arises." But the unavoidable adverse impact of "two sets of 
books" will unfortunately not stop at confusion. Larger 
municipalities can logically be expected to ask this obvious 
question: If the Census Bureau feels that it can estimate 
undercount for nearly 20,000 places with a population of 
under 1 ,000, why shouldn't there be an adjustment for every 
enumeration district or block group in each city, or in each 

The Irreversible Commitment of Resources 

Adjustment begets more adjustment. The National 
Academy panel points out that the distribution of 
revenue-sharing funds would be substantially more affected 
by adjusting for underreported income and suggests that the 
Census Bureau work toward the goal of adjusting 
characteristics of the population as well as the count. The 
current plans for demographic analysis and the 
postenumeration survey will cost at least $20 to $30 million; 
additional adjustment will cost many more millions. As more 
adjustment is called for, more money and staff time at the 
Census Bureau will be devoted to this purpose and diverted 
from the task at hand— taking the census. But this is only one 
irreversible commitment of resources. 

There is another commitment of resources, the total 
impact of which is more difficult to assess but which will 
certainly occur. This is the time and money that will be spent 
in lawsuits. To quote Dr. Keyfitz, "The courts and the public 
will be treated to a fireworks display of statistical and 
demographic exposition, with a generous mixture of truth 
and fallacy." No one can predict the outcome of these suits, 
but it is safe to say that much professional time will be 
diverted from more productive work. The Census Bureau has 
been sued by a group that is against counting illegal aliens for 
reapportionment. There must be a large but as yet unknown 
undercount among aliens. How this group's population 
should be adjusted may be determined more by legal 
precedent than sound statistical practice. Fortunately, that 
suit was decided in favor of the Census Bureau. 

Long-Term Use and Productivity 

The impact on long-term use and productivity will be 
substantial. The major impact will occur because attention 
will be diverted from obtaining the most accurate census 
possible to obtaining the most advantageous adjustment. 
Adjustment for the undercount cannot be hidden from the 
public, from census enumerators, from local public officials, 
or from the media. The total cooperation of all of these 
people is essential in obtaining the best possible response 
rates. Right now the incentives to publicly support the 


census are enormous. But once it has become known that the 
figures will be "fudged"— and that's the way it will be 
reported in the media— the incentives to cooperate will be 
replaced by the incentive to dispute the adjustment 

The National Academy Panel reported on the subject of 
recruitment of enumerators that "the experiences of 
temporary workers during one census may be factors that 
affect their willingness, and the willingness of others, to work 
in the next census." Enumerators, like any workers, will 
perform better and be more productive when they believe 
that their work is important. Shifting the focus from 
obtaining as complete a census as possible to undercount 
adjustment diminishes the importance of 250,000 workers 
and will have a long-term negative impact on their 

There will also be a negative impact on the productivity of 
the permanent staff at the Census Bureau. Using an 
adjustment procedure that is known in advance to have some 
"arbitrary features" is to invite attacks from all sides. 
Undercount adjustment has been, and will continue to be, 
viewed as the most vulnerable part of the Bureau of the 
Census. Future attempts to control census figures will start 
with the idea of having some "disinterested third party" do 
the undercount adjustment. The amount of Census Bureau 
staff and management time spent on defending the 
adjustment procedures and defending itself will seriously 
impair its ability to carry out its hundreds of other statistical 

Mitigating Measures 

There are some mitigating measures that might be taken. 
The first would be to limit the production of undercount 
estimates to the State level. This would have to be 
accompanied by an unequivocal statement by the Director 
that it would be arbitrary and statistically indefensible to go 
any lower. Given these estimates, Congress could then decide 
to amend existing legislation or pass new legislation 
incorporating an adjustment factor for each State. 
Calculations could be made in advance how much a State 
would gain or lose with or without adjustment. The decision 
whether or not to incorporate an adjustment becomes a 
political matter. Certainly, more complicated issues have 
been handled by the political process. 

The estimates of differential undercount have been 
limited to age, race, sex, and ethnic origin. By continuing to 
limit the estimates to the short-form population items, the 
Census Bureau will avoid the statistical quagmire of adjusting 
such items as income or housing value for sampling error, 
underreporting, nonresponse, enumerator bias, etc. Here 
again, a clear statement from the Census Bureau on the 
arbitrary nature of such adjustments would do much to end 
the debate. The public image of the Census Bureau would 
not be enhanced if it were reported that after the census 
questionnaires were processed, the Bureau would examine 
income tax and social security records to check the reported 
incomes. Few people would believe that was a one-way 

Impact on Business 

Initially, the impact of adjustment on the business use of 
census information is not expected to be very great. By the 
spring of 1983, most businesses and all private data 
companies will have obtained the figures they need. In any 
case, reference would be made to the printed reports or 
computer tapes, which would not contain adjusted numbers. 
After 1983, most business planners will probably use current 
estimates or projections based on the trends in unadjusted 
data. There may be some confusion in the marketplace with 
two sets of 1980 population figures, but most market 
researchers would probably just ignore the adjusted counts. 
In any case, the people who use small-area data would not 
have any choice but to use the data on the summary tapes. 

The long-term impact may be substantial. If the quality of 
the census enumeration is impaired or the integrity of the 
published figures compromised, business planners will lose 
confidence in the stacistics end look for an alternative. 
Unfortunately, there really is no satisfactory alternative. No 
one else produces statistical benchmarks. As the uncertainty 
of a business investment increases, so does the aggregate cost 
of that investment. Part of controlling inflation consists in 
lowering the cost of a business, not raising it. 

The Alternative 

Is there an alternative to undercount adjustment? Perhaps. 
The most important reason for lack of a 100-percent 
enumeration is the attitude of certain segments of our 
population. Feelings of hostility, alienation, and a desire for 
personal privacy will continue to exist. These can only be 
overcome by a substantial commitment to educate the public 
on the importance of the decennial census. Part of this 
commitment must be, of course, to guard against trivial 
questions getting on the census form. Another part is to 
continue to maintain total confidentiality. But a large part 
would be to let every public official know that the final 
census count would be the last word for funding purposes. 
Much more attention could then be paid to creation of 
address coding guides, intercensal estimates, local review, and 
other activities that aid the accuracy of the census. The 
census is currently promoted through an intensive advertising 
campaign that begins a few months before the census and 
ends shortly thereafter. But a major effort at public 
education must be an ongoing program, not just once every 
10 years. Last year, for the first time, a section on 
demographic trends appeared in the President's budget. This 
is recognition of the importance of demographic change and 
a beginning in the educational process. The independent 


Population Reference Bureau, with a very small staff and 
limited budget, carries on an effective program of education 
about population matters in primary and secondary schools. 
The College Curriculum Support Project in the Census 
Bureau provides materials for college-level courses. Their 
work is a useful model of how to develop public awareness of 
certain issues. 

These programs are at present severely limited in terms of 
staff and funding. An expansion of these activities, 
supplemented with some of the innovative ideas that have 
come from the new Census Promotion Office, could have a 
positive impact on public attitudes for every census activity. 
The alternative to statistical adjustment is to change attitudes 

about our Government's need for information and about the 
decennial census. It is important that we resist the 
temptation to adjust census figures in the endless search for 
perfect fiscal equity. Because, ultimately, we will have 
neither equity nor a decent statistical environment. 


1. Bateman, David, and Cowan, Charles. "Plans for the 1980 
Census Coverage Evaluation." Paper presented at the 
annual conference of the American Statistical Association, 
San Diego, Calif., August 1979. 


Robert P. Strauss 

Carnegie-Mellon University 


In the papers before us, Francese has argued that a viable 
alternative to correcting the 1980 census for a likely under- 
count would be ". . .to change attitudes about our Govern- 
ment's need for information and about the decennial census." 
This educational approach to correcting errors is justified in 
his view because neither fiscal equity 1 nor a decent statistical 
environment will result from undercount corrections. Keyfitz 
is somewhat more sympathetic to mak : ng corrections, lays 
out for us some of the options available for correcting the 
undercount, and concludes that if any correction is to be 
made, it should be a very simple one. He suggests, for ex- 
ample, that if a 6-percent error rate for blacks is found, then 
every black should be multiplied by 1.06. 

I find both the position of do-nothing for fear of the risks, 
or do something simple on the grounds of general intelligi- 
bility to be unacceptable postures for the Federal Govern- 
ment to take. Keyfitz's approach (a sort of "rough justice") 
suffers from not taking advantage of all available scientific 
evidence that is likely to be available. As such, it might well 
be found to be arbitrary and capricious, were it litigated, 

which in turn could block it ever being actually implemented. 
Also, on scientific grounds, I find not using all available infor- 
mation objectionable. With respect to Francese's view, I find 
that the equities to be sacrificed for the reasons provided to 
be far too costly in comparison to the gains from inaction. 
My perspective on the 1980 census undercount is the 
same as that held on the 1970 undercount: It should be 
corrected and reflected in official Census Bureau statistics in 
as much detail as analysis can provide (race, sex, age, income, 
geographic area, etc.). 2 In my remarks below, I elaborate 
this view by examining in more detail the equities at risk 
and by discussing some of the administrative arguments 
made against correcting for the undercount. 


Much of the justification for acting to correct the likely 
undercount in the 1980 census is based on the inequities 
that will result from subsequent use of the data in Federal 
grants-in-aid formulas. I think that this justification can be 
sharpened considerably by examining the equity issue in 

1 Francese notes that for fiscal equity purposes— the use of Census 
data for allocating Federal revenue sharing funds to local govern- 
ments—the correction of the population undercount is not as im- 
portant as the correction of the income data. He then cites a National 
Academy (1978) study in support of this. If we view "importance" 
on an analytical or logical basis, this is not correct, for it can readily 
be shown that the basic intrastate formula can be rewritten as: 


P 2 T. / Y 2 

i i i 

£ (P 2 T./Y 2 ) 
/=1 ' ' ' 



%. is the percentage share of a fixed dollar amount for the /th 

P. is the population of the /th government 
T. is the adjusted taxes of the /th government 
Y. is the total census money income of the /th government 

Clearly, P and Y enter the above statement in symmetric, albeit 
opposite directions. 

Robinson and Siegel report empirical results for correcting P 
and Y and noting changes in revenue sharing payments, but for 
different problems. Their correction of P was for undercounts, while 
their correction of Y in their experiments was for underreporting of 
income by those who were initially counted. It is not clear that their 
results relate entirely to the question of whether corrections to P 
or Y are empirically more important in the formula, since they did 

not attempt to impute income to the undercounted, which would 
be the other component of measurement error in the Census Bureau 
money income concept. Also, correcting Y for possible underreport- 
ing of income through the use of BEA personal income data is an in- 
direct method as best, since the BEA income concept' is far more 
inclusive. Their corrections and therefore empirical results reflect 
both the correction for underreporting and the use of a broader 
income concept. Of course, if t\P = AY resulting from the corrections, 
then %. will be unchanged. 

I might also note their correction of P in their simulations under- 
states the effect of a correction in P, as they view (1 ) as: 


P T 

r ijj pa 

Y. PCI. 


(P. T i PCI) 

' Y. PCI. 
/ / 

Since it is well known that PCI is a derived figure from the raw 

X Y,j 
Census data, PCI. =1 — , they in effect fail to correct per capita 

' £ p ij 


income's denominator by only adjusting P. in (2). 

2 See Robert P. Strauss and Peter B.'hlarkins (1974), The 1970 
Census and Revenue Sharing: Effects on Allocations in New Jersey 
and Virginia, (Washington, D.C.: Joint Center for Political Studies), 
or "The Impact of Population Undercounts on General Revenue 
Sharing Allocations in New Jersey and Virginia," National Tax 
Journal, Vol. XXVII, No. 4 (December 1974). 



terms of who benefits from the grants, the distinction be- 
tween horizontal and vertical equity as usually made in the 
public finance literature, and by examining these equity 
issues they impact over time, e.g., the next 10 years. 

I take it to be axiomatic that correction of theundercount 
in the 1980 census constitutes a "benefit." The analytical 
problem entails measurement of benefits against the costs. 

It is sometimes argued that in the case of fixed amount 
grants, there are going to be "winners" and "losers." As a 
result of the change in allocation due to a correction, some 
are worse off and some are better off. It then might follow 
that the two need to be balanced before making the correc- 
tion. This is not the place to get into an extended discussion 
of whether Pareto redistribution rules are appropriate cri- 
teria, but I would argue that most theoretical social welfare 
analysis treats positive and negative deviations from the 
"true" or desired outcome— in this case, more accurate state- 
ments of population, income, etc.— on an equal footing. 
Put another way, the loss function is in absolute difference 
terms, not the sum of pluses and minuses. Since there is a 
fixed sum to be allocated, the total amount of gain and loss 
must necessarily be equal, so arguing for a winner/loser 
count as a criteria would always amount to a decision to 
take no action. 

Equity and the Unit of Analysis 

Because general revenue sharing (GRS) is so large in 
dollar value (roughly $6.8 billion/year) and so ubiquitous 
in terms of allocation to general government (roughly 39,000 
jurisdictions), it has been taken for granted in most discus- 
sions of improving the data on these jurisdictions that the 
improvement in data will benefit the governments. Because 
the governments literally receive Federal checks from GRS 
and other grant-in-aid schemes (including, of course, State 
grant-in-aid schemes), it is commonplace to assume that the 
State and local governments are the beneficiaries of improved 
population figures. By contrast, I would argue that the 
grants are there for the provision of services by the govern- 
ments to individuals, and the beneficiaries of improved data 
are individuals. When we talk about more accurately assuring 
individuals of their proper amount of Federal aid, rather 
than governments, I believe the nature of the concern changes 
and becomes more compelling. That is, when we talk in 
some sense about "fairness," which is the usual synonym 
for equity, I think the debate about whether or not to 
correct for the undercount becomes more compelling when 
we address initially the impact of the correction on indi- 
viduals, rather than on governments. 

The Horizontal/Vertical Equity Distinction and 
the Case for Correcting Undercount 

It is common to distinguish between the treatment of 
individuals in the same economic situation— this involves 

matters of horizontal equity— and the treatment of individ- 
uals in different economic situations— this involves matters of 
vertical equity. In taxation, the principles are that, in the 
horizontal case, the individuals should pay the same taxes, 
both in amount and of course in rate. In the vertical case, 
it is generally accepted that the tax system should require 
equal sacrifice among individuals of differing abilities to 
pay, and that under most interpretations of the varying 
utility of income, this will involve a progressive tax system 
in which the tax rate on low-income persons is smaller than 
the rate on high-income persons. 

Now, I would judge much of the discussion over the 
"equities" at risk in the two papers to tacitly assume that 
horizontal equity is at risk, ignoring the issue of vertical 
equity. This is especially the case, since the unit of analysis 
is assumed to be the governmental unit. The sort of argu- 
ment I hear in these and other papers seems to be that if 
we can get the populations right, then the governments will 
be put on an equal footing. If the numbers were not to 
change much (1 to 3 percent), it could be argued (I think 
this underlies Keyfitz's simplified approach) that they are 
already on a nearly equal footing -therefore, why bother? 
When one views the beneficiary of Federal aid to be indi- 
viduals, however, I believe the problem of horizontal equity 
looms larger, and one becomes more reluctant to ignore 
3-percent errors. 

If one views vertical equity to be at risk initially, however, 
I think again the argument for making the corrections is 
more compelling. For example, I think the motivation for 
general revenue sharing and most other grant-in-aid pro- 
grams is essentially redistributive in nature. That is, the 
purpose of most if not all Federal grants is to achieve a 
defined vertical equity through the redistribution of Federal 
funds to individuals. Viewed in vertical terms, the failure 
to correct for the undercount may be viewed to be a failure 
of Federal programs to redistribute to less well-off indi- 
viduals from better-off individuals. Because the less well-off 
tend to be black, young, and male, the loss in vertical equity 
of not correcting for their being undercounted is substantial. 
It is precisely because many Federal programs attempt to 
assist these individuals that their undercount is so egregious 
and demands correction. 

Equities at Risk Over Time 

In computing costs and benefits, it should be remembered 
that the benefits accrue over time, and unless we have a mid- 
decade census, the benefits will accrue through 1990. In 
deciding whether or not to correct for the undercount, these 
benefits and any associated costs should be discounted back 
to the present, using some appropriate discount rate. 

To get some rough magnitude of the equity risk, let us 
assume that Federal grants are to rise at 10 percent per year 
over the next 10 years, that a 10-percent discount rate is 
appropriate, and that there are $50 billion per year at issue 


with a possible error of 1 -percent undercount rate. Back of 
the envelope calculations then suggest that $50 million per 
year is at risk, or $500 million for the entire period. If we 
are willing to weight the value of not getting funds properly 
to the poor more heavily than erroneously giving funds to 
the rich, e.g. weight the absolute values of the errors, then 
the benefit will be larger. 

It is difficult for me to envision that a loss of this order 
of magnitude would not warrant concerted action by the 
Bureau as well as the Office of Statistical Policy and Stand- 
ards to correct for the undercount. To be sure, these are 
rough assumptions, but indicative of the equities at risk over 
time. It will be interesting to see if the benefit of this magni- 
tude is matched by a corresponding investment cost to 
correct for undercount. 




Both papers indicate that the Bureau, were it to correct 
for the undercount, would have to keep, in effect, two sets 
of books. Because the correction might come substantially 
after the initial enumeration, it is argued that the Census 
would have to publish another complete set of documents. 
This would be somewhat expensive (paper, personnel, and 
computer costs), open the Bureau to possible litigation and 
congressional intervention, and be generally untidy. 

As an economist, indeed even as an academic economist, 
I must say that I find the conflict surrounding the under- 
count issue to be a sign of health and vitality in our Federal 
statistical system. It is understandable that an agency pro- 
ducing a product, the Bureau and its numbers, does not like 
to have the product's quality questioned nor be told how to 
produce its numbers. Fortunately, even though the Bureau 
is a public monopoly, it has participated in the analysis of 
its strengths and weaknesses. Economic theory generally 
predicts that competition, which is often accompanied by 
conflict and in this case administrative aggravation, fre- 
quently improves the quality. Thus, I view the aggravation 
that might result from the correction of the undercount to 
be a benefit rather than a cost. 

This conflict over the undercount that has occurred over 
the last few years was certainly predictable in view of the 
change in the way the Federal Government has been making 
grant-in-aid payments. The movement from discretionary 
awards to formula-based grants ensured that there would be 
more scrutiny of the underlying data base, and as a result, 
more pressure for correction of known errors. With respect 
to the future, I think it is quite clear that as a result of the 
slowdown in the long-term growth path of the economy, 
concern over the size of the public sector, and the desire to 
balance the budget. Federal aid will grow more slowly than 
in the past, perhaps less than the 10 percent used above. The 
implication of this for the aggravation content of our Federal 

statistical data base is that it will increase, as people compete 
for scarce dollars by arguing that the data need adjustment. 
I view this conflict, however, to be inherently helpful to 
ensuring that we have the best statistics in the world. If there 
is no constant scrutiny and review of our statistical ma- 
chinery, especially in view of the centralization in Federal 
statistics promoted by this Administration, I am concerned 
that an inferior product will result. 

With respect to the confusion that is said to possibly re- 
sult from the Bureau keeping two sets of books, I think 
several observations are in order. First, there is precedent for 
correcting data from a census when it is put in machine 
readable form. The per capita incomes that were printed in 
the 1970 Fourth-Count volumes and those made available 
on tape much later were not identical. More convincing, 
however, is the point that Federal statistics are generally 
updated, revised, rebenchmarked, and corrected. For ex- 
ample, our GNP series, which is used for a wide variety of 
public policy purposes, goes through several corrections, 
and the user community has not been perplexed by this. 
Also, there is ample precedent in the general revenue shar- 
ing program for correcting data; Treasury has for many 
years been giving governmental recipients the adjusted tax 
data, and asked them to approve it or complain about it. 
Admittedly, it is more difficult for the population and 
income data to be handled on that basis, but the contrast 
between the Treasury's inviting criticism and the Bureau's 
reluctance to correct for known errors is surely striking. 


The chief justification in my mind for correcting the 1980 
census for the likely undercount involves the equities that 
will be at risk if nothing is done. Both papers acknowledge 
that, because the results of the census will be widely used, 
there are grounds for making the corrections. However, 
neither finds the equities at risk to be so compelling that a 
complete correction be made for the known undercount by 
age, race, sex, income, etc. In my review of the equities at 
risk, I have tried to demonstrate that the most significant 
equities at risk involve the income redistribution efforts of 
the Federal Governments vis-a-vis individuals, not govern- 
ments as is frequently suggested. When we focus on the 
vertical equities of individuals that will be at risk if no cor- 
rection is made, especially in light of the amounts of money 
and the extended time period involved, the concerns over 
administrative inconvenience that such a correction might 
entail appear to be minor. Finally, there is ample precedent 
in a wide variety of other Federal statistics for the correction 
of known data errors. 

In sum, I think the only responsible stance that can be 
taken vis-a-vis the undercount is that the 1980 census be 
speedily corrected for it. To do less would violate the public 
trust the Bureau enjoys to create the most accurate data 


The question was raised as to how a statistical agency 
attempts to implement the intent of Congress. It was ob- 
served that Congress is not reapportioned by a complex set 
of calculations such as the gross national product. However 
simplistic a census adjustment might be, it still would re- 
quire two "sets of books," and this would be a mistake. 
It was also stressed, however, that opening the door to ad- 
justment does not necessarily lead to increased litigation; 
the courts' contribution— through relying on experts— would 
be slight. An immediate crude adjustment that can be refined 
later as the data become available was advocated. For ex- 
ample, each black might count for 1.06 persons in revenue- 
sharing calculations. This would avoid two sets of figures 
and could even be legislated. Even in the absence of new 
legislation, if the relative underenumeration of a group is 
known, a factor could be applied to increase the amount of 
funding at, say, the municipal level. This assumes that the 
true population is not known. What is being sought is a way 
to prevent the intrinsic error from affecting equity for major 
groups. Although it was feared by others that an arbitrary 
1.06 adjustment would not stand up in court, as it would 
not take advantage of all of the information available, a 
simple solution such as this would forestall a "free-for-all" 
in the construction based on the census by local officials. 

It was noted, however, that the largest redistribution 
program depends more on income than on population, 
so that the effect will be negligible if an adjustment is made 
for population alone. Nonetheless, there was little discus- 
sion of a census adjustment for income. 

It was noted that adjusted figures have not been published 
for previous censuses, and it was presumed that unadjusted 
figures will be published for 1980. It was suggested that the 
Bureau should also publish, however, some adjusted figures 
that are arrived at by a convention determined by a desig- 
nated "censor" and the decisionmaking bodies. The two sets 
of figures will differ, but this situation will be an incentive 
to reduce the differential in 1990. Further, the decision- 
making bodies will have more liberty in choosing the figures 
to be used. There will be some public confusion, but this 
should be reduced with time and use of the data. Other 
participants reacted that two sets of census figures should 
never be published, however. They argued that nothing other 
than an official count should be issued. The Bureau might 
make available estimates of the undercount, with which the 
user could do his own arithmetic to arrive at adjusted figures. 
In this way the population count would not be tested. 

The group returned to a discussion of the advantages of a 

simple adjustment procedure. It was questioned that a 
simple method would rid the census of gross errors when 
some tests of more elaborate techniques correlate poorly 
with the simple estimates being proposed. Some findings 
also suggest that adjustment for age and sex is unimportant; 
the key item is race. Others indicate that geographic location 
may be most important; e.g., central city versus suburb. 
Further, the National Academy of Sciences found that the 
accuracy of synthetic estimation is not great, and its feasi- 
bility is questionable. The main point is to choose an ac- 
ceptable convention. If the postenumeration survey (PES) 
gives the equivalent of a complete count, then regressions 
can be made on the PES, but there should not be a large 
number of adjustments to attain minor gains in accuracy. 
In the past, PES's have been incomplete, and simply pro- 
vided another variant. It may take careful demographic 
analysis to arrive at gross estimates for race at the State 
level, so there is no "simple" method to obtain gross 

It was also thought to be doubtful that there is a single 
"best" adjustment. The pressures to produce one were 
questioned and some participants felt that it would be better 
to attempt several, such as in the Bureau's projections 
where "A," "B," and "C" series are produced. 

The Census Bureau indicated that it is working with a 
modification of a simple synthetic method using race (and 
age and sex, if desired) to take into account variations in 
income, and this caused large modifications in the alloca- 
tions. Demographic estimates of error do not correlate very 
highly with simple synthetic procedures. If they are con- 
verted to revenue-sharing allocations, they will differ radically 
from the simple synthetic estimates. Revenue sharing illus- 
trates the effects of changing coverage errors. The PES 
presents serious problems with regard to establishing the 
level of coverage for individual States, but the PES results 
can be merged with the demographic estimates in a weighting 
procedure. A moderate approach was suggested, however. 
Since some of the more complex methods for adjustment 
still need to be worked out, the Bureau was cautioned 
against going too far too fast. It was argued that matching 
a PES with address registers gives uncertain results. 

It was emphasized also that even simple adjustment de- 
pends on technological capabilities. The models for adjust- 
ment are not simple, and there are no unique choices, so 
much testing is needed, and there is no time for that for 
1980. Caution was recommended for the immediate future, 
but for the long run, active research was suggested as the 



only reasonable course. It may, in fact, not be feasible to 
adjust under present conditions. 

The importance of the census figures in Congress was 
stressed, however. Litigation is inevitable should nothing 
be done in the way of adjustment. The problem will not 
"go away." It was argued that attempting to ignore the 
problem is a dangerous position. The only protection against 
litigation is to say that this is the best that can be done at 
this point. That "best" must not be "arbitrary and capri- 
cious," but the situation cannot wait for extended research 
to identify a generally recognized best method. It was argued 
that the impact of the undercount is too important not to 
adjust, and that even a simple procedure might leave every- 
one unhappy but quiet. There was some concern with 
equity in a simple solution but that equity decisions should 
be left to the Congress, which should also specify the 
methodology and write the nature of the adjustment into its 
formulas. Congress might take a simple approach with 
technical advice from the Census Bureau. 

Although the Congress might act in several ways, one 
could be to have the Census Bureau publish the actual count, 
followed by estimates of the underenumeration. Congress 
would have the option of how these figures could be used 
in other legislation. This would follow the Canadian ex- 
perience, where it was clear that the Parliament did not 
know what to do; the statistical agency is best qualified in 
this area. The Census Bureau probably will not be allowed 
to do nothing, but I cannot simply present its research 
results; it must offer reasonable minimal action for getting 
closer to equity. 

The Census Bureau emphasized the distinction between 
counts and estimates and stressed the need to label them 
accordingly. A simple estimate is justified only if it is also 

the best one possible. The complicated nature of the esti- 
mation process also was underscored; matching presents 
difficulties, and so does any other method. The synthetic 
method assumes some factors and ignores others (such as 
the lack of data on Hispanics). Urban/rural differences are 
not small; they are curvilinear. It is only a guess that the 
blacks that were within-household misses may be concen- 
trated in large cities. 

Attention also was called to the resources affected by 
adjustment; the $50 billion in Federal funds allocated each 
year are only part of the total benefit. There are also State 
and local distributions, and still more in the private sector; 
for example, a decision about locating a plant affects job 
opportunities. There is a need for more accurate data for 
private decisionmaking. It was felt that survey sample weights 
based on the census could affect minorities adversely for 
some time. 

The American Demographics article 'The Statistical 
Nightmare" (Vol. 1, February 1980, pp. 18-23) was suggested 
as containing a possible compromise on the issues of com- 
plexity in the adjustment and uses of the adjusted figures. 
It was proposed in the article that the unadjusted figures be 
used for apportionment, but that the adjusted figures should 
be relied upon in the allocation of funds. If only adjusted 
figures were available, the Bureau would be forced to play 
an unintended role, but equity in fund allocation demands 
some adjustment comparable to using estimates between 
censuses. Legislative bodies could use such estimates, which 
the Bureau could publish officially as one component in its 
estimates and projections series. Geographic estimates 
present technical but not political problems, whereas choos- 
ing the best distribution of funds, for example, is a political 
matter, and the lawmakers can make their own choices. 

The Congressional Perspective 

Daniel P. Moynihan 

United States Senate 

It is an honor to address this distinguished gathering on 
the urgent and important subject of the "census under- 
count." I would begin by recalling a similar event held here 
in Washington 13 years ago, under the joint auspices of the 
Bureau of the Census and the Harvard-M.I.T. Joint Center 
for Urban Studies, of which I was then Director. That con- 
ference was, to my knowledge, the first ever held on the 
problem of the "undercount," and I sketched its origins 
and purposes in a foreword to the conference's proceedings 
as published by the Joint Center in 1968: 

At one point in the course of the 1950's John Kenneth 
Galbraith observed that it is the statisticians, as much as 
any single group, who shape public policy, for the simple 
reason that societies never really become effectively con- 
cerned with social problems until they learn to measure 
them. An unassuming truth, perhaps, but a mighty one, 
and one that did more than he may know to sustain 
morale in a number of Washington bureaucracies (hateful 
word!) during a period when the relevant cabinet officers 
had on their own reached very much the same conclusion— 
and distrusted their charges all the more in consequence. 
For it is one of the ironies of American government that 
individuals and groups that have been most resistant to 
liberal social change have quite accurately perceived that 
social statistics are all too readily transformed into poli- 
tical dynamite, whilst in a curious way the reform tem- 
perament has tended to view the whole statistical process 
as plodding, overcautious, and somehow a brake on 
progress. (Why must every statistic be accompanied by 
detailed notes about the size of the "standard error"?) 

The answer, of course, is that this is what must be done 
if the fact is to be accurately stated, and ultimately ac- 
cepted. But, given this atmosphere of suspicion on the 
one hand and impatience on the other, it is something of 
a wonder that the statistical officers of the Federal 
Government have with such fortitude and fairness re- 
mained faithful to a high intellectual calling, and an even 
more demanding public trust. _ 

There is no agency of which this is more true than 
the Bureau of the Census, the first, and still the most I 
important, information-gathering agency of the Federal 
Government. For getting on, now, for two centuries, the 
Census has collected and compiled the essential facts of 
the American experience. Of late the ten-year cycle has ! 
begun to modulate somewhat, and as more and more 
current reports have been forthcoming, the Census has 
been quietly transforming itself into a continuously flow- 
ing source of information about the American people. In, 
turn, American society has become more and more 
dependent on it. It would be difficult to find an aspect 
of public or private life not touched and somehow shaped 
by Census information. And yet for all this, it is somehow 
ignored. To declare that the Census is without friends 
would be absurd. But partisans? When Census appropri- 

ations are cut, who bleeds on Capitol Hill or in the Execu- 
tive Office of the President? The answer is almost every- 
one in general, and therefore no one in particular. But the 
result, too often, is the neglect, even the abuse, of an 
indispensable public institution, which often of late has 
served better than it has been served. 

The "avowed purpose" of the 1967 conference was that 
"of arousing a measure of public concern about the difficul- 
ties encountered by the census in obtaining a full count 
of the urban poor, especially perhaps the Negro poor"— we 
would now say black. 

Our impetus, in short, was the "undercount," to use 
today's word, that had occurred in the 1960 decennial 
census. "It was hoped," I wrote, "that a public airing of the 
issue might lead to greater public support to ensure that the 
census would have the resources in 1970 to do what is, after 
all, its fundamental job, that of counting all the American 
people. . . (T)he full enumeration of the American popula- 
tion is not simply an optional public service provided by the 
Government for the use of sales managers, sociologists, and 
regional planners. It is, rather, the constitutionally mandated 
process whereby political representation in the Congress is 
distributed as between different areas of the Nation. It is a 
matter not of convenience but of the highest seriousness, 
affecting the very foundations of sovereignty. That being the 
case, there is no lawful course but to provide the Bureau with 
whatever resources necessary to obtain a full enumeration." 

Our focus, clearly, was on obtaining a complete enumer- 
ation, and it is a fact that the Census Bureau made a valiant 
effort to do just that in the 1970 census. But, it is also a 
fact that it failed. This was not an abject failure, however. 
There is some evidence that the 1970 undercount was less 
severe than 10 years earlier. And there is ample evidence— not 
least the convening of today's conference and the commis- 
sioning of the studies and papers that preceeded it— that the 
Census Bureau has made a forthright and conscientious 
effort both to estimate the size of the 1970 undercount and 
to develop a range of possible remedies for the future. 

The future has now arrived. The 1980 census is just 
weeks away. A monumental attempt has been made— is 
being made— to obtain as complete an enumeration of the 
population as possible. Yet we know that there will again 
be an undercount. And we are here today to discuss a dif- 
ferent issue than that which absorbed our attentions 13 
years ago. Given that the problem known as the "under- 
count" seems destined to persist in the actual enumeration 
of the population, and given our ability to estimate with 
reasonable accuracy how large it is, should adjustments be 



made on the basis of statistical estimates of the undercount? 
In other words, if we cannot find everyone, but can develop 
defensible estimates of the numbers of persons whom we 
cannot find, should the census figures be adjusted in light 
of these estimates? 

No doubt you are familiar with my own view. Last fall, I 
introduced a bill in the Senate to instruct the Secretary of 
Commerce to adjust the population figures of the 1980 
census to correct for the undercount, and to require every 
Federal official who administers a program under which 
money is distributed according to population data to use 
these corrected figures. 

My bill was in part a response to the conclusions of the 
Panel on Decennial Census Plans, a distinguished group of 
statisticians appointed by former Secretary Kreps, that 
". . . (T)he issue of equity cries out for attention. . . . (T)he 
Census Bureau would be able to respond in an appropriate 
and competent fashion to a directive to adjust the State and 
local population data for the undercount. . . ." Sensitive to 
the Panel's further judgment that "whether to adjust the 
census estimates is largely a policy issue; how to do it is 
primarily a technical one," (emphasis added) I sought to 
provide a basis for resolving the policy issue and producing 
the appropriate "directive" to the Bureau. But legislation 
is not actually required for this purpose. The Secretary or 
the President could issue the necessary directive. Indeed, 
the Bureau could furnish the indicated information, without 
any directive. 

There are three questions that should be addressed in the 
course of this conference. Let me take them up one at a time. 
First, what does the Constitution require? As I read Article 
I, Section 2, and the 14th Amendment, there are two pro- 
visions. The "whole number of persons in each State, exclud- 
ing Indians not taxed" shall be determined every 10 years. 
And the House of Representatives "shall be apprortioned 
among the several States according to their respective 

It is well established that the responsibility of the census 
is to count the number of persons to be found in each State. 
It matters not whether they are voters or nonvoters, adults 
or children, citizens, lawfully-admitted foreigners or "illegal" 
aliens, able-bodied or handicapped, English-speaking or not, 
self-supporting workers or dependent persons. Every human 
being physically present within the borders of a State on 
April 1— and some not present— is to be counted, and the 
House of Representatives is thereafter reapportioned accord- 
ing to that count. 

The Constitution is silent on the question of the "under- 
count." One can reasonably assume that the founding fathers 
never anticipated the issue. Theirs was a thinly populated and 
primarily agrarian society in which, to exaggerate only 
slightly, everyone in a community knew everyone else by 
sight if not by name. It was deemed a relatively simple 
matter to count them. 

But the founding fathers were wise enough to recognize 

that they could not anticipate every contingency, so Article 
I, Section 2 further provides that the "actual enumeration 
shall be made. . .in such Manner as (the Congress) shall by 
Law direct." 

The current law provides as follows: 

The Secretary shall, in the year 1980 and every 10 
years thereafter, take a decennial census of population as 
of the first day of April of such year. . .in such form and 
content as he may determine, including the use of sam- 
pling procedures and special surveys. In connection with 
any such census, the Secretary is authorized to obtain 
such other census information as necessary. 

Once again, no mention is made of the "undercount" 
issue, but the law is certainly permissive with respect to the 
procedures by which the Secretary shall enumerate the pop- 
ulation. I am not a lawyer, but it seems clear enough that the 
phrases "in such form and content as he may determine" and 
"including the use of sampling procedures and special 
surveys," are meant to provide the Census Bureau with suf- 
ficient flexibility to produce as accurate and complete a 
count as possible. 

The second question to be addressed is the availability and 
reliability of methods by which completeness and accuracy 
can be enhanced, assuming now— and I do assume— that it 
will never be possible to obtain a complete enumeration 
through traditional census procedures. In this area, I defer 
to the immense sophistication of those gathered here, adding 
only my own impression, as one who has worked with census 
data for many years and has a passing acquaintance with 
statistical methods and techniques, that many such methods 
are available and sufficiently reliable, and that others can be 
developed with relative ease. Clearly, precision and reliability 
will suffer as one gets into th'e finer grained tables. It may be 
fruitless to seek to apply estimating techniques and statistical 
corrections to comparisons of the level of school achievement 
in 32-year-old unmarried women residing in adjoining census 
tracts. It could be done, but not with sufficient reliability to 
make it worth doing. So judgments will have to be made— 
and made by persons such as yourselves— about the specific 
demographic facts that lend themselves to such techniques. 
At the very least, it will be necessary to make clear to those 
unfamiliar with statistical techniques which numbers and 
which differences are significant and to what degree. But 
many of the fundamental facts about our population— 
notably how many people live where— can be estimated with 
a high degree of reliability. And in my view they should be. 

The third question is, what uses should be made of the 
estimated population data as opposed to the enumerated 
population data? I would retain the distinction. Indeed, I 
would have the Census Bureau produce two sets of figures. 
It is not the Bureau's responsibility to determine which 
should be used in which circumstances. I for one do not 
believe (although as a New Yorker this is a statement against 
interest) that the adjusted figures should be used to re- 
apportion seats in the House of Representatives. One ought 


not tamper with a fundamental constitutional procedure that 
has served us since 1790. I am firmly of the view that the 
population count that is used for apportioning the Congress 
ought to be one in which every person is able to be associ- 
ated with a name and an address, not with a statistical 

I have no such reservations about applying the "second 
set" of census numbers to Federal spending programs, how- 
ever. To the contrary, the decennial enumeration has been 
inadequate for such purposes for a long time, and it is often 
necessary to make interim estimates of one kind or another. 
The point of all such estimates is to ensure that a Federal 
program meant to alleviate a particular problem or condition 
is administered in accord with the most accurate and timely 
information we can obtain about the actual incidence of that 
problem or condition. It matters not whether the program 
pays for health care, social services, compensatory education, 
eases the fiscal burden on State and local governments, cares 
for the handicapped, or subsidizes housing for the needy. 
Insofar as the basis— or a basis— for apportioning Federal 
funds throughout the country is demographic data supplied 
by the Census Bureau, that data should be up-to-date and 
complete. Completeness requires that we adjust for the 
undercount, just as we adjust for population shifts from one 
area to another. This can be done and in my view, justice 
and equity require that it be done. 

The alternative is the continuing politicization of the 

census itself, and a steady decline in public confidence in it. 
If members of any group believe that their numbers are 
understated by the census, they will inevitably (and in my 
view understandably) press for ad hoc adjustments to be 
made in the formulas by which Federal funds are meted 
out; they will seek special enumerations of their group; and 
they will increase the political pressures on the Census 
Bureau to be something less than the respected, neutral 
agency that it is, even as they seek to discredit the data that 
it produces. I cannot doubt that will happen, for as a Senator 
from New York, I would be obliged to respond in not 
dissimilar fashion if I had reason to believe that my own 
constituency was being disadvantaged by the census under- 
count. It is simply the fact that billions of dollars are now 
redistributed among various parts of the Nation on the basis 
of census data. If we fail to establish the proposition that 
those data are as accurate and complete as the techniques 
of enumeration and estimation can make them, the ensuing 
politicization will damage the census even as it jeopardizes the 
objectivity on which these myriad formulas tenuously rest. 
We will seek to minimize the undercount, but we will 
never eliminate it. The challenge to this conference, and 
to the Secretary of Commerce and the Bureau of the Census 
in its aftermath, is to do the next best thing: To make every 
effort that can be made within the bounds of sound statis- 
tical methodology to estimate the undercount and to publish 
the results of those estimates. 

Considerations I 

Can Regression Be Used To Estimate 
Local Undercount Adjustments? 

Eugene P. Ericksen 

Temple University 


Census undercount has become a public issue in 1980. 
The political and fiscal implications of the possibility that 
levels of undercount vary between places has increased the 
pressure on the Census Bureau to compute local estimates 
of the number of people missed in the decennial census. 
When considering whether and how to do this, the Bureau 
needs to find a method of establishing whether the under- 
count does vary between places, whether the variations are 
systematic, and whether the nature of these variations can be 
estimated. Doing this successfully appears to require match- 
ing administrative records against census forms for large 
probability samples of States and large metropolitan areas. 
The administrative records must enumerate the uncounted 
population, the matching procedure must be feasible, and the 
sampling design must be sufficiently precise to provide re- 
liable estimates for a variable whose actual values between 
places will not vary greatly. Descriptions of the Census 
Bureau's plans are given in [1]. As they describe them, 
the procedures appear to be neither easy nor cheap, the 
main problem being one of matching people at the same 
address. Computing these sample estimates of local 
undercount (a most difficult procedure) is the critical 
link in the computation of undercount estimates. Once 
the sample estimates are at hand, we have a variety of 
techniques using symptomatic indicators to compute final 
estimates. Leaving the difficult problem of actually com- 
puting the sample estimates to the Bureau, my objective 
here is to discuss ways to combine the estimates with aux- 
iliary information to derive the final estimates of local under- 

A review of recent papers by Census Bureau personnel [1 , 
15, 18] indicates a recognition of the statistical and political 
problems associated with undercount estimates. The under- 
count adjustment procedure needs to be statistically sound 
and politically credible. If the Census Bureau is going to 
part from its tradition of publishing figures based only on 
people actually counted, it must have firm and noncontro- 
versial ground to stand on. The undercount adjustment 
procedure should be correct and easily understood, and be 
capable of defense by knowledgeable statisticians. The need 
to withstand legal challenges calls for a sound but conserva- 
tive procedure rather than a potentially more perfectible one. 

There are many ways to classify undercount adjustment 
procedures, and I would like to discuss two. One classifica- 
tion sorts procedures into "individualistic" and "ecological" 
categories. Most individualistic procedures would result in a 

synthetic estimate [10, 13; but see 12 and 14 for useful 
evaluations and comparisons with other methods]. One first 
estimates individual likelihoods of being missed. This would 
be done for aggregates of individuals classified by demo- 
graphic variables like age, race, and sex; a good example of 
this is the national undercount estimates given in [16]. 
Multiple regression, as illustrated in [3] , could be used to 
incorporate more variables into the estimate, though log- 
linear models [2] may be a more appropriate technique. 
Once the rates are computed for subgroups of individuals, 
these would be assumed to be constant across localities, and 
the subgroup-specific rates would be applied to the demo- 
graphic structure of each areal unit to provide an overall 
estimate of undercount for that place. The approach is 
intuitively plausible, and will work unless there are peculiarly 
local effects that cannot be incorporated into the estimate. 
Two examples of such local effects are: (1) If age-race-sex 
estimates such as those computed by Siegel are applied to the 
age-race-sex distributions of all localities, but both blacks and 
whites are less likely to be counted in central cities than 
suburbs, and (2) if there are variations in the efficiency with 
which local census offices collect forms in "hard-to-count" 

By contrast, with the "ecological approach" we don't 
attempt to compute rates for individuals, but simply com- 
pute estimated rates for localities and look for systematic 
variations in these aggregated rates using a procedure like 
linear regression. We might find that estimated rates for local- 
ities increase where there are large black populations, the 
locality is a central city, or there is a large proportion of 
multihousing-unit structures. Such an estimate would not 
imply that blacks were harder to count, simply that locali- 
ties where blacks lived had a greater undercount. The missed 
population could, for example, be Puerto Ricans. 

Another way to classify methods is into "simple" and 
"complex" categories. Both synthetic and regression esti- 
mates are complex, as their credibility rests on lists of 
assumptions concerning distributions of variables and rela- 
tionships among variables. The set of sample estimates for 
localities is an obvious candidate for a simple estimate. Let 
us consider their use, specifying the variable in question to 
be the ratio of total to enumerated population and assum- 
ing that the estimates have values perceptibly greater than 
1.000, that the coefficient of variation is 0.005 or less, and 
that nonsampling errors are minimal. In other words, the 
procedures described in [1] have been successfully applied. 

These estimates would work for States. This is important 
because States are the first level of funding for Federal 



revenue-sharing programs, and grants to States comprise a 
large proportion of such spending. The estimates would fare 
less well below the State level. Even if good sample estimates 
were computed for large metropolitan areas, since these are 
not governmental units, we would have only one estimate 
to be applied to all jurisdictions within the standard metro- 
politan statistical area (SMSA). Since we expect to find rates 
of undercount higher in largely black central cities than pre- 
dominantly white suburbs, a single adjustment for both is 
implausible and will do little to alleviate the political pressure 
calling for undercount adjustments. 

Secondly, the sample estimates are not easy to compute. 
If the sample estimates are derived through a particularly 
laborious procedure, they may no longer qualify as "simple" 
estimates, and estimating their error structure could be prob- 
lematic. Records from the Internal Revenue Service are to be 
used in the administrative record matching and we know that 
tax returns miss people, especially the poor and/or mobile 
who are not likely to be counted in a census. The record- 
keeping systems necessary for triple system estimation are 
horrendous to maintain, and matching problems add a non- 
sampling error component to the mean squared error of the 
sample estimates. Thirdly, even if the mean squared error is 
random and small, some larger errors will occur by chance 
alone. If 5 percent of the sample estimates are two standard 
deviations away from true values, and this is an unacceptable 
level of error, not knowing which observations are the 5 
percent weakens the credibility of all estimates. 

identifying the outliers requires an auxiliary estimate 
which is independent of the sample estimate [7] , and the 
synthetic and regression estimates we have discussed are two 
prime candidates. A good review of available auxiliary esti- 
mates is given in [14] . The auxiliary estimates are needed at 
least as a check on the sample estimates. If the errors of the 
auxiliary estimates are equal to or smaller than those of the 
sample estimates, the auxiliary estimates would be preferred, 
since they can be applied to a wider variety of places. 

They have several advantages. Local sampling fluctuations 
are absent. We can use the sample estimates to test the good- 
ness of fit of alternative models, and the model selection 
process will help us to learn about the conditions where 
census undercount is more or less likely to occur. The ex- 
planation provided by the selected model increases the 
credibility of the model. We can use the sample estimates to 
estimate the error of the auxiliary estimates and make an 
empirical case for the selected model based on minimizing 
average squared error. 

This lessens the main disadvantage of synthetic and re- 
gression estimates, which is that they are biased. Regardless 
of sample size, the expected value of a synthetic or regression 
estimate for a locality is unlikely to equal the expected value 
of an unbiased sample estimate. The extent that local condi- 
tions are atypical is important, and though testing for the 
goodness of fit can indicate that the bias is small in general, 
it can be large in a particular case. This problem of using 

regression for local estimation has been illustrated for 1970 
population estimates for counties [6] . In general, errors were 
low, but some very large errors were obtained for counties 
w,th unusual age distributions. 


Since we don't expect to rely on sample estimates alone, 
we need to use the auxiliary information to select a model of 
undercount. We need a parsimonious model to sell the esti- 
mates, which means that we don't choose a complex model 
unless there is convincing evidence that a simpler one should 
be ruled out. Here are three models, which can be arranged 
in order of increasing complexity, that can be considered: 
(1) All areas assumed to have equal levels of undercount; (2) 
the undercount of an area can be accounted for by its demo- 
graphic structure and by national undercount rates estimated 
separately for demographic categories like age, race, and sex; 
and (3) there are local variations possibly to be accounted for 
by regression. If the sample data fit model (1), we select it. 
Failing this, we give preference to model (2) over model (3). 

Let us introduce notation to describe the models. Each 
person is in a demographic subgroup /' and lives in areal unit 
/' in block k. We have a simple random sample of n blocks 
in each of the areal units and each block is enumerated 
completely. For each /,/, k we compute 

x ijk 

r ijk * ~ y ijk * /x i/k 

the number of people in subgroup / in block k 
of areal unit / who are counted in the census 

the adjusted count after matching procedures 
have been completed. 

We also write x ■ = the total census count in areal unit j, 
and we would like to know the comparable value of y -. 
We write x , the census count for the Nation and y , trie 
adjusted count for the Nation. We assume that / equals the 
adjusted count that would be derived using procedures given 
in [16] . Under model (1), we would estimate the adjusted 
count for each block to be 


where r = y lx . The adjusted count tor areal unit / 
would be: 

y .j' =(x .j ){r .) 

Since we have a simple random sample of blocks in each areal 
unit, we can perform an analysis of variance with the areal 


units specified as treatments and the blocks as observations. 
We compute: 



and perform the analysis of variance on the r-. *. 

Legal and financial decisions based on decennial census 
counts of local populations have been made as if model (1) 
were correct. A clear establishment of the fact of differential 
undercount is a necessary justification for the computation 
and use of local area estimates. Since we know that the 
undercount varies across demographic groups and that demo- 
graphic structures of places also vary, we expect to reject 
model (1). To be sure that our analysis of variance test 
fairly rejects the null hypothesis associated with this model, 
we should replicate it for small subgroups of States and 
metropolitan areas grouped by region or some other char- 
acteristic. Undercount adjustments should not be computed 
if place-to-place variations are small. 

This done, we next test model (2). A claim for the correct- 
ness of model (2) would state, for example, that blacks are 
consistently harder to count than whites in the same juris- 
diction. The counterargument would claim that racial differ- 
entials result from the nature of where blacks and whites live, 
with blacks being concentrated in areas that are hard to 
enumerate. In such areas, they may be no harder to count 
than nonblacks. Model (2) can be tested using analysis of 
co variance. 

We write the following 

Yjk = fi/k 




yak' = n. *i}k and ',-. = 2 f us? l*;jk* 

1 k j k 

We can use an analysis of covariance model [17, pp. 420-425] 
to express the y-^ * as follows 

y/k* = »j +b %-k"-y'.:) + % 


the effect of location in areal unit / once the 

effect of the synthetic estimate y-," has been 


the regression coefficient expressing the effect 

of y k " on the values of /.. * once the locality 

b = 

e ik 

effects have been removed, and 
= a random error term. 

We compute a pooled estimate of b over the samples of 
blocks within each areal unit and use as an estimate of the 
effect of location in areal unity: 

<V = y i. 


Y jk 


An F test of the equality of the n-'s can then be computed. 
If the test fails, we conclude that a model whereby there is 
no systematic set of local "effects" is not inconsistent with 
the data. In other words, once we know what the synthetic 
estimates for the sample of blocks are, we gain no additional 
information from knowing the identity of the areal unity. 

If we reject both the tests, our task becomes that of iden- 
tifying other relevant factors affecting the undercount. The 
regression-sample data procedure [4, 5] provides a way 
of doing this, and it can be used on its own or along with 
synthetic estimation. Any synthetic estimate can be 
tested by analyzing its correlation with the sample esti- 
mates or by including it in a regression equation along with 
other predictors observed to be related to the sample esti- 
mates of undercount. 

The unit of sampling becomes an issue for regression esti- 
mates. Current census plans appear to call for the sampling 
of large metropolitan areas and those States and remainder 
of States outside these SMSA's. Because these areas, agglo- 
merations of many local jurisdictions, tend to be hetero- 
geneous with respect to race, the variance of r -, the ratio of 
adjusted to counted population, is likely to be minimal. To 
illustrate, suppose that 8 percent of blacks and 2 percent of 
nonblacks are missed in the census count. If we apply these 
rates to the 1970 racial distributions of States and assume 
that no other factors influence the undercount, we derive 
values of r ■ ranging from 1.020 (9 States each of whose pop- 
ulation is less than 1 percent black) to highs of 1.042 in 
Mississippi and 1.038 in Louisiana and South Carolina. The 
mean of this distribution is 1.025 and its variance is 30.52 x 
10" 6 . A similar calculation for the 33 SMSA's with over 1 
million population in 1970 gives a range of 1.021 (four 
SMSA's) to 1.039 (New Orleans) and 1.037 (Baltimore), 
a mean of 1 .027 and a variance of 22.24 x 10. 

An alternative pian would subdivide these agglomerations 
into more homogeneous units. For example, instead of 
having the 33 equivalent samples of metropolitan areas, we 
would have 33 samples of central cities, 33 samples of their 
suburbs, and a total of 66 units of observation, each with 
comparable, though smaller, sample sizes. Cutting the sample 
size in half will double the sampling variance, but this could 
be more than compensated for by the increased explanatory 
power of the predictor variables. We repeated our calcula- 
tions based on the 8- and 2-percent noncoverage rates for the 
66 central city /suburban units and obtained a range of 1 .020 
(6 suburbs) to 1.063 (Washington, D.C.) and 1.052 (Newark), 


a mean of 1.029 and a variance of 89.06 x 10 , four times 
that obtained for the 33 SMSA's. If it were feasible to com- 
pute, a regression equation based on the 66 units would be 
applicable to a greater range of places than would a regres- 
sion equation based on the 33 units. This is because we could 
evaluate a greater range of values of racial composition and 
could explicitly estimate the effects of central city location. 
Let us now discuss the methodology of regression in more 


The methodology of the regression procedure consists 
of first obtaining a set of sample estimates for the areal units 
under consideration and then estimating their variances. We 
next obtain a set of variables thought to be related to the 
dependent variable under consideration, and compute a re- 
gression equation using these auxiliary variables as predictors. 
The auxiliary variables can be, and often are, alternative esti- 
mates of the dependent variable in question, and correlations 
between the predictors and true, but unobserved, values of 
the dependent variable have often been between 0.90 and 
1.00 when evaluations on test data have been made. Be- 
cause of the within primary sampling unit (PSU) error, the 
observed correlations are lower, but if the usual assumptions 
of regression analysis are met, the regression coefficients 
obtained are unbiased estimates of the coefficients that 
would be obtained if the true values of the dependent vari- 
able were available. The mean squared error of the regres- 
sion estimate can be written 

MSE = {n-p 

1) a J In + (p + 1) a w 2 In 



the variance of the true, but unobserved values 

not explained by regression, 

the sampling variance of the sample estimates, 

the number of observations, and 

the number of predictor variables. 

The a can be written as a = a . (1 - R ), where a. 
is the original variance of the true values and R 2 is the coef- 
ficient of determination that would be obtained were the 
true values available. Our expression for the mean squared 
error is not to be confused with the mean squared difference 
between regression and sample estimates, which is: 

2(/ - Y) 2 ln = [(n-p 

It should be noted that: 

D/"] <% 2+ o 

MSE = [Z(V - Yf/n] - (n-2p-2)o w 2 ln 
We can readily see that the mean squared error of the re- 

gression estimates is less than the mean squared error of the 
sample estimates if o 2 < oj 2 . For estimating the census 
undercount, if the variances of sample estimates, a 2 , are 
unexpectedly high, regression becomes a more attractive 
alternative, since the underlying relationships between vari- 
ables are not affected. If the correlations between synthe- 
tic estimates used as predictor variables are as high as we 
would reasonably expect them to be, we would almost 
certainly pick regression over sample estimates. 

Our equation for the mean squared error can be used to 
evaluate choices to be made in computing regression esti- 
mates. Among these choices are: 

(a) When is it advisable to add additional predictor vari- 
ables to the regression equation? 

(b) Is it better to use fewer observations with smaller 
sampling variances or more observations with larger 
sampling variances? 

To illustrate how these choices might be made, we present 
(table 1) examples of mean squared error computations. 
These have been computed for a variety of circumstances. 
We assume a good synthetic estimate to be available as the 
first predictor variable. Sample correlations of this with the 
actual (but unobserved) values of the ratios of the true to 
counted populations are given in the first column, followed 
by estimates of the correlations that would actually be 
observed for three different values of the sampling variances. 
The lowest, 25 x 10 , is approximately what I understand 
to be the goal of the Census Bureau's sample design. Esti- 
mates of the mean squared error and its components under 
these conditions are then given. This is followed by esti- 
mates of increases in the coefficient of determination with 
the true (but unobserved) values, which would have to be 
obtained to make it useful to add a second predictor variable. 
The exercise is then repeated with the number of observa- 
tions and sampling variances each doubled. The variances, 
a^ 2 , of the underlying values of r ■ are multiplied by four 
in an attempt to replicate what would happen if we used 
66 central city/suburban units instead of 33 SMSA's. The 
results are given in table 1 and can be summarized as follows: 

1. The observed correlations are dampened substantially 
by the sampling variances. They appear in fact to be 
affected more by the size of the sampling variances 
than by the strength of the underlying relationships. 
But, observing a low correlation doesn't mean that 
regression has failed. We need to look at the mean 
squared error to make that judgment. 

2. For given values of o 2 , increases in a 2 led to a larger 
mean squared error of the regression estimates. The 
gains of regression relative to sample estimates are 
greater when the sampling variances are large. In only 

one case, where a, 2 = 100 x 10~ 6 , r 2 = .500, and 

2 -6 

a = 50 x 10 , is the mean squared error of the 


Table 1. Mean Squared Errors of Regression Estimates Given Various Combinations of Variances and Sample Sizes 

a 2 =25x10" 6 , n = 33, p= 1 

r 2 



r 2 



Mean squa 


If p increased to 2, 
R 2 now needed to 
obtain same mean 

(x 10 6 ) 

(x 10 6 


(x 10 6 ] 

squared error 1 


variance x 10 6 




25 100 






100 400 






1 .4 5.9 












2.1 5.8 






.92 NP 






1.1 5.6 






.82 NP 






0.8 5.3 






.61 .98 

a. 2 = 100 x 10 6 , n = 66, p= 1 


variance x 10 6 






800 50 













24.2 3.9 













24.1 6.4 













24.0 8.8 













23.9 11.2 













23.6 20.9 













23.3 30.6 













22.7 50.0 






NOTE: The mean square 

d error of the re< 


t estimates can be written as: MSE = a 

2 +(p 

+ 1)(ct '■ 

-o 2 ). 

1 This refers to the same mean squared error obtained for p = 1 when a second predictor variable is added to the regression equa- 
tion. Where "NP" is indicated, the correlation would have to be greater than 1 .0, so it is impossible to obtain the same mean squared 

regression estimate as great as a 2 and doesn't exceed 

3. Reductions in a 2 are offset by increases in o w 2 when 
predictor variables are added to regression. Looking at 
the last three columns of the table, we find that under- 
lying value of R 2 needed to reduce the mean squared 
error are bigger when the sampling variances are larger. 
In some cases, when the first predictor variable is 
strongly related to the dependent variable, there is no 
gain from adding predictors. Should this occur, we may 
want to consider simply using the synthetic estimate, if 
it is in fact the first predictor, since it will not include 
the sampling error component of the mean squared 
error of the regression estimate. 

4. The choice between the 33 and 66. areas depends on 
the relative mean squared errors. We can see that it is 
easily possible for reductions in a J" to offset the in- 
creased a 2 when we have more observations, each 
with increased sampling variance. The fact that a re- 
gression equation based on a larger number of more 
homogeneous units can be applied more flexibly will 
induce us to select it if the mean squared errors are 

comparable to those obtained with fewer but more 
heterogeneous units. The greater sampling variances 
need not deter us from this choice. 


The use of regression is not without problems. First, the 
(p + 1 )o w 2 /n factor limits the number of predictor variables 
that can be used. Since the likelihood of being missed in the 
census is no doubt influenced by many factors, our model 
will be overly simple. In 1970, while we had seven indicators 
known to be related to population growth, we were only 
able to utilize four of them in a regression equation esti- 
mating 1960-1970 population growth [5, 6]. This was for 
389 observations with a larger variance of the true 
values than is likely to obtain for our dependent variable 
here. For a more similar application, estimating unemploy- 
ment rates for SMSA's, Gonzalez, Hosa, and I [6, 11] had 
122 observations, and the sampling variances were large. 
The best results were obtained with only two of six avail- 
able predictors being used. 

A second important problem comes from the small num- 


ber of observations. Regression equations depend on a set 
of assumptions that can be spoiled in an application such 
as this one. In particular, we must look for possible relation- 
ships between the errors of the sample estimates and the 
true values. Does the sampling variance of the undercount 
estimate get larger where undercount is greater? This could 
occur if high undercount areas are concentrated in a few 
neighborhoods in cities. Are there observations with par- 
ticularly large deviations from the regression line? Use of 
regression to estimate both population and unemployment 
[7, 11] has shown the effects of outliers due to measure- 
ment error to be substantial. The need to examine the 
assumptions of linear regression is important here. 

This problem is related to that of the universe of gen- 
eralization that occurs when equations are computed on 
one kind of unit and estimations are wanted for another 
kind. The problem, pointed out by Fay [8], was illu- 
strated in our 1970 population estimates for counties 
[5] . There, the regression equation was computed on 
PSU's from the Current Population Survey, which were 
heavily weighted toward urban areas. Most counties are 
rural and have small populations. The equations based on 
PSU's were less accurate for estimating population growth 
for counties with smaller populations. Thus, a regression 
equation using central cities and suburbs as units is preferable 
to one using SMSA's. But if undercount estimates are wanted 
for counties and other smaller jurisdictions, we would prefer 
a regression equation computed from a sample of at least 
similar units. The problem of generalization also occurs in a 
second way. If we have 50 States, there may be only a 
handful with a given characteristic, say a large Hispanic 
population, that influences the level of undercount. The 
influence could be important for these few States, but 
including the characteristic as a variable in regression might 
fail to give a substantial enough increase in the underlying 
R 2 to offset the added error of the sampling variance com- 

With these problems, how should we proceed? I suggest 
two strategies. One is to spend some time perfecting a 
synthetic estimate. We will have a very large national sample 
of households with which to estimate the effects of a large 
set of variables on the individual likelihoods of being missed 
by the census, using regression and log-linear techniques. 
Individual probabilities, based on a large number of factors, 
can be computed, and these can be aggregated over the 
demographic structure of a place to give an adjusted count. 
The goodness of fit of such estimates could be tested by 
analysis of covariance. These synthetic estimates could be 
used as predictor variables in regression along with other 
predictor variables. Should another predictor appear to be 
important, we might then revise the synthetic estimate. For 
example, if we had a two-variable equation with two im- 
portant predictors, an age-race-sex synthetic estimate and a 
dummy variable indicating location in a central city, we 
might then try an age-race-sex-central city or other synthetic 

estimate. The availability of both synthetic and regression 
procedures gives us a great deal of flexibility in computing 
auxiliary estimates, and the presence of sample estimates 
allows us to evaluate their goodness of fit. 

Secondly, how should we deal with discrepancies between 
sample and regression or synthetic estimates? In general, if 
the mean squared error of the regression estimates is less 
than the sampling variances, we would reject those cases 
where sample estimates were greatly different than regression 
estimates and recompute the regression equation [7, 11]. 
When this was done for 1970/1960 population growth ratios, 
the mean squared error of the regression estimates was re- 
duced by 17 percent with four predictors and 3 of 389 
observations removed. When this was done for 1970 unem- 
ployment rate estimates, the mean squared error was reduced 
by 16 percent with two predictors and 6 of 122 observations 

When the mean squared errors of regression and synthetic 
estimates are comparable, James-Stein weighting procedures 
[8, 9] are appropriate. We still need to resolve the issue of 
what happens when the two estimates are far apart. There 
is some feeling that the sample estimate should be given 
precedence by not allowing the weighted estimate to differ 
from it by a specified amount [8, p. 172] , say, one standard 
deviation of the sample estimate. 

Selection of an estimate must of course be done after 
examination of the empirical evidence. At this point, I favor 
a computation of a synthetic estimate based on as many 
factors as the individual-level regression or log-linear analysis 
of individual likelihoods of being uncounted indicate to be 
relevant. Such estimates for large areas can be evaluated using 
regression, as we indicated in the previous section. Relating 
the final estimates to individual likelihoods of being counted 
should increase the credibility of final estimates. If the mean 
squared error is comparable to the sampling variance, we could 
consider a weighted average with the sample estimate, al- 
though the increased complexity of the estimate might offset 
the advantages of allowing data actually collected in the 
area for which the estimate is computed to influence this 


1. Bateman, David V., and Cowan, Charles D. "Plans for 
1980 Census Coverage Evaluation." Paper presented at 
the August 1979 meetings of the American Statistical 
Association, San Diego, Calif. 

2. Bishop, Yvonne M.; Fienberg, Stephen E.; and Holland, 
Paul W. Discrete Multivariate Analysis. Cambridge, 
Mass: The MIT Press, 1975. 

3. Cohen, Reuben. "Drug Abuse Applications: Some 
Regression Explorations with National Survey Data." 
National Institute on Drug Abuse, Research Monograph 
24. Washington, D.C.: U.S. Government Printing Office, 


4. Ericksen, Eugene P. "A Method for Combining Sample 
Survey Data and Symptomatic Indicators to Obtain Pop- 
ulation Estimates for Local Areas." Demography, 10 
(1973), 137-160. 

5. Ericksen, Eugene P. "A Regression Method for Esti- 
mating Population Changes of Local Areas." Journal of 
the American Statistical Association, 69 (1974), 867-875. 

6. Ericksen Eugene P. "Population Estimation in the 
1970's: The Stakes are Higher." Report submitted to the 
U.S. Bureau of the Census, 1975. 

7. Ericksen, Eugene P. "Outliers in Regression Analysis 
when Measurement Error is Large." Proceedings of the 
American Statistical Association, Social Statistics 
Section, 197 5, 412-417. 

8. Fay, Robert E. "Some Recent Census Bureau Applica- 
tions of Regression Techniques to Estimation." National 
Institute on Drug Abuse, Research Monograph 24. Wash- 
ington, D.C. U.S. Government Printing Office, 1979. 

9. Fay, Robert E., and Herriot, Roger A. "Estimates of In- 
come for Small Places: An Application of James-Stein 
Procedures to Census Data." Journal of the American 
Statistical Association, 74 (1979), 269-277. 

10. Gonzales, Maria, and Waksberg, Joseph. "Estimation of 
the Error of Synthetic Estimates." Paper presented at 
the meeting of the International Association of Survey 
Statisticians, Vienna, Austria, 1973. 

11. Gonzalez, Maria, and Hosa, Christine. "Small Area Esti- 

mation with Application to Unemployment and Housing 
Estimates." Journal of the American Statistical Associa- 
tion, 73 (1978), 7-15. 

12. Levy, Paul. "Small Area Estimation: Synthetic and 
Other Procedures, 1968-1978." National Institute on 
Drug Abuse, Research Monograph 24. Washington, D.C: 
U.S. Government Printing Office, 1979. 

13. National Center for Heath Statistics. Synthetic State 
Estimates of Disability. PHS Publication No. 1759. 
Washington, D.C: U.S. Government Printing Off ice, 1 968. 

14. Purcell, Noel J., and Kish, Leslie. "Estimation for Small 
Domains." Biometrics, 35 (1979), 365-384. 

15. Robinson, J. Gregory, and Siegel, Jacob S. "Illustrative 
Assessment of the Impact of Census Underenumeration 
and Income Underreporting on Revenue Sharing Allo- 
cations at the Local Level." Paper presented at the 
August 1979 meetings of the American Statistical 
Association, San Diego, Calif. 

16. Siegel, Jacob S. "Estimates of Coverage of the Popula- 
tion by Sex, Race, and Age in the 1970 Census." Demo- 
graphy, 11 (1974), 1-23. 

17. Snedecor, George W., and Cochran, William G. Statistical 
Methods, 6th ed. Ames, Iowa: Iowa State University 
Press, 1967. 

18. U.S. Department of Commerce, Bureau of the Census. 
"Proceedings of the Census Undercount Workshop, 
September 5-8, 1979." 

Modifying Census Counts 

I. Richard Savage 

Yale University 


After the 1980 U.S. census, there will be pressure to modi- 
fy the population counts before they are used in the formula 
allocation of funds [2] . The Census Bureau has suggested 
the need for modification by its convincing argument about 
omission rates 1 in recent censuses; in 1970, the black omis- 
sion rate (7.7 percent) was four times the white omission 
rate (1.9 percent) [5] . The Bureau continues to clarify the 
situation and to develop data modification methods; July 
1978 was the end of a Census-sponsored National Research 
Council [4] project that evaluated plans for the 1980 census. 
The problems of census count modification were emphasized. 
At the same time, and with the same sponsorship, a project 
began on population and income data needed for small 
areas. This project must also include work on census data 
modification. A major current Census Bureau publication 
on the possibility of modifying census counts is [7] , Devel- 
opmental Estimates of the Coverage of the Population of 
States in the 1970 Census: Demographic Analysis, which 
will be referred to as SPRR, in honor of the authors, Siegel, 
Passel, Rives, and Robinson. SPRR was the stimulus for the 
preparation of this essay. 

In the remainder of this section, the SPRR title will be 
examined word by word to set out some of the themes 
of this paper, which is not a summary or review of SPRR. 
I do not assume my readers have read SPRR. 

"Developmental" is the key word. At the present time, 
the Census Bureau does not propose a method that would 
be acceptable for modifying census counts. The hard work of 
SPRR is only a small part of what would be needed to have 
an acceptable census data-modification method. SPRR and 
this essay consider future steps in the development of ac- 
ceptable methods. Reading SPRR, one cannot tell if the 
authors feel disappointed because they did not have much 
success in finding acceptable methods for modifying census 
counts. (See [7] , p. 105, "Levels of Usage.") 

The primary task of SPRR is to construct "estimates of 
the coverage of the population" rather than "estimates of the 
population." Thus SPRR examines the quality of the 1970 
census in contrast to modifying the census results. SPRR 
suggests plausible alternatives to the census counts in contrast 
to modifications which could replace the census counts. It 

is awkward to give a good analogy to the situation, so I will 
settle for several paragraphs of discussion. 

SPRR examines in detail the quality of the 1970 census. 
What are the sizes of the undercounts for individual popula- 
tion segments? The emphasis is not on finding an alternative 
set of population estimates. You could work in the spirit 
of SPRR in the following situation: A set of dependent 
observations have been regressed on a set of independent 
variables. You are given the least squares residuals and the 
values of the independent variables. From this you could 
learn much about the data and model. You could detect 
outliers, you could see if the errors are relatively uncor- 
rected or if there are some built-in dependencies, and you 
could locate regions in which the model works well and 
where it works poorly. But you cannot sketch the response 
surface or predict the response for a given set of independent 
variables. If you were interested in the measurement process- 
possibly to improve it— rather than in the measurements, 
this situation might be satisfying. 

When a set of data are examined, it is often possible to 
spot errors, but how to correct the errors is not always 
evident. The recent paper by Lindley, et al. [3] discusses 
this problem in detail for the situation where a person pre- 
sents personal probabilities that are not consistent with the 
laws of probability. For example, what should be done when 
a person gives P(A) = 0.3 and P(A) = 0.5? Clearly, at least 
one of the probabilities should be increased by 0.1, but the 
rational correction is not evident. Likewise, errors can be 
detected in the demographic data. There was an excess of 
30 percent blacks between ages 25 to 34 in the 1970 census 
who said they were born in the Northeast compared to what 
birth and death records indicated. Although the evidence in- 
dicates that most of the error is in the State-of-birth data, 
the situation does not spell out the precise allocation of the 
error. Thus SPRR emphasizes the sources of trouble and 
plausible sizes of errors without attempting to correct the 

"Population" is of particular concern, but other variables 
have measurement problems; it is very difficult to measure 
income, which plays an important role in the allocation 
formula for general revenue sharing. Attention is fixed on 
population because so much is known about it; it appears 

' For expository purposes, this document will discuss rates for two 
races, "black" and "white." The paragraph between equations (18) 
and (19) indicates possible meanings for "black" and "white." In 
applications, more than two races are required. 

Work sponsored by the National Science Foundation 
under Grants 7810496-SOC and 7818166-SOC. 



in most uses of census data, it is a relatively easy concept to 
understand, 2 and it is relatively accepted as something 
appropriate for the Government to measure. 

The data and methodological resources of SPRR are 
strained even to estimate the omission rates for the "States." 
Even larger correction rates are likely to be needed at the 
sub-State level. Work at that level awaits new data, new 
methodology, and new money. 

SPRR's concentration on the 1970 census is appropriate 
in terms of available data. However, one census does not 
provide a sound foundation for the development and valida- 
tion methodology. It might be enlightening to apply the 
SPRR methodology to new situations, such as the 1960 or 
the 1980 censuses. This is an awkward task, since the set of 
data associated with each census has substantial differences 
from the other censuses. 

SPRR discusses several methods for estimating omission 
rates. Its emphasis is on "demographic analysis," an approach 
that does not emphasize "random," as in "random error." 
Thus, for SPRR, "estimates" refer to plausible bounds on 
errors rather than to standard statistical concepts such as 
confidence intervals. 

Estimates of population sizes from demographic analysis 
would be hard to accept. The estimates do not make direct 
use of the census counts. In fact, the estimates are from 
independent noncensus data; they are replacements rather 
than modifications. Further, SPRR does not offer hope of 
ever directly checking the quality of the estimates from 
demographic analysis. But estimates that cannot be verified 
are suspect. 

I fear my remarks about SPRR will leave the wrong im- 
pression. My criticism is in large part of my wish to see us 
move further along into this important area. The SPRR 
authors are to be commended. They have done a large task 
with great care and ingenuity, and they have maintained a 
cautionary position in spite of pressure to do otherwise. 


In 1970, there were more than 200 million people to be 
counted by the U.S. census. About 2.5 percent of the popu- 
lation were omitted— 1.9 percent of the whites and 7.7 per- 
cent of the blacks. SPRR's task is made more difficult because 
of lack of perfection in the data from counted individuals. 
Thus, State of birth is used as a foundation for SPRR's 
demographic analysis; 3.9 percent, or 3,957,000 whites 
under age 35, did not report a State of birth, and for blacks 
residing in the Northeast who were between ages 25 and 34, 
the State of birth was given in error about 30 percent of the 
time. The SPRR analysis of State-of-birth data should have 
independent interest to scholars and the public. 

SPRR examines the data in detail. Thus, it seems un- 
likely that additional major problems in the data remain to 
be discovered. One topic that the public might feel does not 
receive adequate attention is that of illegal aliens. If, as 
some believe, illegal aliens constitute an appreciable propor- 
tion of some local populations, then their omission in the 
SPRR analysis could be serious. (See [1] and [9], which 
explicitly recognize the problem.) It is not clear if the authors 
of SPRR have considered this problem and made provision 
for it. The Census Bureau uses a variety of techniques to 
impute the existence of some people and characteristics of 
some other people; the SPRR discussion does not overtly 
include this topic. 


Before attempting to discuss the needs for modification 
of population counts and the associated problems, it is 
efficient to develop some notation. Basic relationships be- 
tween the defined terms will be derived for their immediate 
interest and future needs. The beginnings of a statistical 
framework will appear. 

To avoid excessive complexity and wordiness, concepts 
will be introduced that do not exactly match census prac- 
tice. In particular, the population consists of two races, blacks 
and whites. Symbols such as b, j3, B, will be used for ob- 
served or computed numbers and proportions of blacks; 
w, co, W, for whites, and p, it, P, for total populations. The 
observed or computed value of a count or proportion will 
typically not be the same as the target value corresponding 
to the definitions and theory underlying the observation 
process. These target values or parameters will be indicated 
by an arrow overscore, for example, w. The number of re- 
gions, such as the 50 States and the District of Columbia 
or the four census regions of the United States will be 
represented by n. The regions could more broadly be taken 
as categories like age groups or combinations of geographic 
and economic groups. The important point is that they form 
a partition of the population. Subscripts will identify regions. 
Symbols without subscripts will represent totals or averages; 
the context will make clear which. The term "States" will be 
used instead of "regions" for concreteness. 

Let b- (w.) be the census count of blacks (whites) in the 
/th State. Even "the census count" is an awkward term to 
define in practice (see comments below, regarding displays 
(1) through (4)); here it is assumed that the definition has 
been specified and that the counts are available. The census 
count of the population in the /'th State is 

p. = b i +w i 

The national counts are 

2 The technical and legal definition of population is complex and 
subject to controversy. See Robert Reinhold, "Dispute Over Aliens 
Snarls Census Plans," New York Times, December 21, 1979, pp. A1 
and A24. 

b = Zb f 







p = 2 p ■ = b + w = 2 {b ■ + w-) 



Displays (1) through (4) contain some very acceptable 
consistency relationships, such as the total census count is 
the sum of the black census count and the white census 
count. In practice, this might not be the case because race 
was not determined for all those counted, or the total count 
might be published before racial counts; hence, the racial 
counts might contain records obtained after the total count 
was given. The U.S. census and SPRR attempt to maintain 
such consistency, and SPRR requires the same consistency in 
its modifications. These consistency relations will be used in 
our analysis. Some of the detail is given in (8) through (17). 

Let upper case letters represent possible modifications 
of the census counts of the corresponding lower case letters; 
bj becomes B- w becomes W, etc. The relationship between 
upper and lower case symbols can be expressed in terms of 
omission rates in the following ways: 

B.= (1+P i )b.,orp. 

B r b i 

W, = (1 + co)w. , or co . = _ L 

i' i ' i 



w i 


Pj= (1 +TT/)P i , Or TTj 



The selected modification process will generate the B- 
and W-, which will determine the B-, co-, and it.. In SPRR, the 
B- and W . are based on data that ideally are not related to the 
current census. 3 Thus, such expressions as B- = (1 + j3 -)/? - 
should not be interpreted in the SPRR analysis as correct— 
the black State count to yield the modified black State 
count. 4 Rather, (1 + fl.) is the ratio of two independent 
estimates of the black population in State /. The basic task 
of SPRR is the development of the estimates B- and W-. 
It gives several related estimates. Notice the intent is 
Bj = b; etc. That is, the census and the modification process 
have the same target population. The above relationships are 
not independent, since P- = B- + W- , i.e., 

(1 +ir/lp f =^ +j3 / )b / +(1 +Uj)Wj 


3 In fact, the current census is used by SPRR for migration data. 

4 It is, however, natural to speak of estimates of (3. , etc. In applica- 
tions of the SPRR and related methodologies, one is likely to esti- 
mate these rates for population strata that are finer than those for 
which demographic analysis provided alternative counts. 

Then, from (1), 


TT ; - 

b j+ Wj 

For the considered procedures, B = 2 B ■ or 


Now use (2) 

(1+13)6 = 2(1 +p.)b. 

2 6.. 




M/=2M/ : 

co = 

2 cow. 
2 w j 

P = 2 P.. 

77 = 

2 TTjP, 

P = B + W 


IT = 

jib + cow 
b + w 






It is important to verify that (15) and (17) are consistent, 
j36 + cow 2 B-b- + 2 cOjW. 2 TTjP- 

77 = 

b + w 

2 bj + 2 Wj 

* Pj 

where the equalities are from (17), (11 and 13), (9), and (15). 
In the following analysis, it is assumed that the omission 
rates for the United States as a whole are correct, that is, 
P = P and n = it, so that P = (1 + 7r)p could be written as 
P = (1 + 7r)p. Similar remarks apply to B and W. Unless 
necessary, the arrows will not be used. The following 
numerical values from Census, [5] are assumed: 

rr = 0.025, B = 0.077, and co = 0.01 9 


Since the census counts {b-, w-} are known, the values of 
b, w, p, B, W, and P are known. The problem is to estimate 
the { Bj, coj } and the { B., Wj, P } } . 

To show a possible use of this model, consider the "basic" 
modification process. (This is considered again in sections 


7 and 8, but it is not a procedure seriously considered 
by SPRR.) In the basic procedure for each State, one de- 
fines B- = 0.077 and cj- = 0.019. Then one can compute 
P. = 1.077 b- + 1.019 w-, and from (9) the n- can be com- 
puted. As examples, this process yields omission rates of 
0.019 for South Dakota and 0.041 for Mississippi. 5 The basic 
method used here is closely related to the synthetic basic 
of [6] . That report considers three races, and in table 1 the 
omission rate for the third race is 0. Thus, the table has 
omission rates less than 0.019! SPRR (p. 5, no. 13) works 
with two races: White and black and other races. Table 
Vll-G, based on the Current Population Survey, considers 
whites and other races and blacks. (The heading of this table 
mentions an adjustment for imputation in the census.) 
The estimated omission in State / is 


A. -P r P r n i p r ^b i+ w.w i 

The error rate is 

def A. A- p. 

/?. = —I = —L 1. = Tf; 

p i 

P, P; 

1 +7T; 



Usually the B-, to-, and n. are positive, and their absolute 
values seldom exceed 0.1. Notice R- is computed with the 
modified count in the denominator, while n. has the census 
count in the denominator. With the presumption that the 
modified count is better than the census count, the R- is 
a better measure of the error rate than 7r •; /? • and n. seldom 
differ by much and they always have the same algebraic 
sign. The analysis could be developed in terms of n- or R-. 
The use of it- makes the expressions for going from counts 
to modification relatively simple; R would be useful in 
going from modification to counts. When < it-, then 

n r ir l 

2 <R.<7l : 


To this point, the notation has been purely descriptive 
and did not have any statistical framework. In the next para- 
graph, notation is introduced that would be useful in the 
statistical analysis of the modification process. In the appen- 
dix to this section (appendix A), several probabilistic models 
are given that describe in part the modification process. 
Those models are not presented in the body of this section 
because they are artificial and would be inappropriate to 
apply to data. The models illustrate types of reasoning and 
the existence of consistent mathematical structures to des- 
cribe the modification process. The purpose of this essay 
is not to construct useful modification procedures. The 
statistical models in later sections have elements of im- 
mediate interest. The models in the appendix might be useful 
in Monte Carlo studies of the modification process. 6 

5 The basic estimates cannot yield tt < 0.019. 

'For some situations, realistic stocnastic models for census modi- 
fication are available. See the paper by Ivan Fellegi in these pro- 

For each computed or observed quantity such as B- or 
p., there can be a corresponding random variable designated 
by "*" such as B* and p*. A random variable such asp* 
can incorporate many different aspects of the data problem. 
Before a census is conducted, p* and its distribution func- 
tion summarize our knowledge of the count to be obtained 
of the population. The distribution is based on available 
demographic data and the planned performance character- 
istics of the forthcoming census. The distribution of B* 
after the census has been performed would depend on how 
the census process appeared to operate and on the used 
modification procedures. In some circumstances, it is neces- 
sary to use a strong subjective component in evaluating these 
distributions, while in others, it will appear that ample 
frequency information is available. For example, if the 
States all have the same distribution for B*, and this distri- 
bution does not change between censuses, there is likely to 
be agreement on the common distribution of B*. 

The emphasis here is not on finding these distribution 
functions but to point out their existence. As a corollary, 
because the modification process is in a probability frame- 
work, it will be possible and appropriate to subject the modi- 
fication process to statistical analysis. 

The hard work— not begun here— includes: 

1. Selecting an adequate philosophical base. The material 
at hand does not readily respond to a naive frequentist 
approach. As in many social problems, it appears that 
the entire population is observed; that is, the fixed 
population of States. This can be circumvented by such 
devices as (1) thinking of "the States" as a sample 
from a hypothetical population of "States" or (b) 
thinking of sub-State units as the basic random ele- 
ments. Since each census has so many special features, 
it is not useful to think of the population of U.S. 
censuses. It is useful to think of the populations of 
special census, census checks, etc. 

2. The choice of philosophical base will be intimately 
woven into the kinds of models that are found appro- 
priate to describe the phenomenon and the kinds of de- 
cisions and inferences for which the models are made. 
This work is also in the developmental stage [10]. 

Before closing these general remarks, note that one might 
need the joint distribution of several random variables, such 
as B-, B-,, and co. . For the analysis in this paper, only the low 
moments of random variables will be used. Thus, the most 
complex item of interest would be a covariance. 


The State omission rates, when national values are applied 
at the State level, were indicated in (18). SPRR seriously 
consider other assumptions that generate error rates as large 
as 9 percent, and some of their extreme assumptions result 


in rates over 1 1 percent. (The States that generate extreme 
values are Alaska and New Mexico; SPRR considers features 
other than the white-black dichotomy.) It is hard to tell if 
these percentages are important. To begin to understand 
their significance, it is useful to look at how population is 

A small change in the population of a State can change 
by one the size of its congressional delegation. It appears 
to take a change of about 470,000 for the next seat to be 
gained or lost. California, with a population of 2 x 10 7 and 
with omission rate in the range 2.8 to 4.7 percent [7, table 
Vlll-D], has an omitted population between 5.6 x10 s 
and 8.8 x 10 s . So California would appear underrepresented 
by at least one seat. But this analysis is too simple. If the 
California count is to be corrected, then the other State 
counts must be corrected. In fact, when the adjustments 
of SPRR are made, changes in the sizes of the delegations 
are relatively hard to predict. Oklahoma loses a seat for 
most of the proposed ways of modifying the count. Cur- 
rently, Oklahoma has six Members of Congress, the average 
size of a district is quite small (427,000), and the omission 
rate is small. None of the modifications that have been 
analyzed by SPRR resulted in a change of a State's delega- 
tion by more than one seat or in a change for the California 
delegation (but, see [6] , p. 13). 

The consequences of census omissions for within-State 
apportionment have not been extensively studied [6] . Al- 
though modification of census counts has not been con- 
sidered as a part of the apportionment process, the con- 
sequences of omission are required to know how well the 
Supreme Court mandate— one-man-one-vote— is satisifed. 

At this time, the monetary value of public data of vary- 
ing quality has not been approximated. The consequences of 
varying the quality of public data— such as the above work on 
apportionment— would serve as guides to the needs. Policy 
analysts, legislators, and special -interest groups should find 
such studies useful. 

Modified population counts are likely to be used in the 
formula allocation of funds. The formulas in current use at 
the Federal level are complex. Here, we briefly consider three 
simple allocation processes. 

Application. Each individual applies for his appropriate 
amount. Food stamps would be an example. "Individual" 
could be a school district, town, etc. In this allocation 
process, the individual must be able to supply data of accep- 
table quality. The Government needs national data for 
budgeting, and some local data of limited quality are needed 
for setting standards and administration. Application has dis- 
advantages, such as personal and administrative costs of the 
applications. When the individuals are governments, there 
could be substantial local needs of the kinds of data now 
provided by the national Government. 

Open ended. Assume the Government wishes to distribute 

to each State (or other unit of government) a fixed amount 
of money per individual in the State, say $A Certain educa- 
tion funds are distributed to selected portions of the popu- 
lation in this manner. 

In the open-ended allocation, if the counts are not modi- 
fied, the /th State will receive 



and the total allocation will be Ap. Modifications are made 
in the following manner: Estimate the omission rates— say, 
/3- and co — and then compute the modified counts 

W.= w. (1 + cj.) and B. = b. (1 +0.) 
the modified State 7 population \sP.= B.+ W.. 


Thus, with the modification, the /th State will receive 

AP : 

and the total allocation will be 

AP = AEP. 



The change in allocation to the /th State, as a result of modi- 
fication is 


and the total change is 




where A = np 

For later reference, we make the following observations. 
The change in an allocation is A times the corresponding 
population modification. Now using the mode of thought 
begun earlier, one obtains: (1) The "average change" or 
"standard deviation of change" of allocation is A times the 
corresponding figure for population. (2) The "correla- 
tion" between modifications in population and allocation is 
1. The relative changes in population and allocation are iden- 

This allocation process is attractive: (1) It is simple to 
apply and to explain. (2) In most situations when modifica- 
tions are made, each State will receive an increased allocation. 
(3) Further, if the Government replaces A by 0.975/4, the 
open-end feature does not result in a total allocation larger 
than the one planned, $-4. 

The amount of money involved in the changes can be 
substantial. If the total allocation is $5 x 10 9 , then the 

7 Recall, the "States" refers to any partition of the population. 


amount involved in the changes is about 0.025 x 5 x 10 9 
= 1.25 x 10 8 . This amount would be visible to the States. 
Hence, there would be tremendous pressure to make changes 
and the method of change would be hotly discussed. 

The process outlined in this section would in practice 
be quite complicated. Instead of 50 States, the process could 
be applied to the large number of units in general revenue 
sharing. In addition to race, modifications of population 
values could use other variables such as age or income. The 
modification process can involve data from many sources 
and need not be the same for all areas. 

Fixed pie. The Government allocates $>4'to the States 
in proportion to their populations. Thus, State / receives 


w. + b. 

i i 



If the counts are modified with the same method used for 
open-ended formulas, the modified allocation to State / is 


w f .(1 + co / )+fc / (1 +/3 / ) 

S(w-{1 +c*j)+b. (1+jJ.)) 


The total allocation remains at $-4'. 

The change in allocation to the /th State as a result of the 
modification process is: 


P p 

= A' 

P/(1+ir # .) p 

p (1 +n) 


71". - 


1 + 



Some of the changes will be positive and some negative. The 
total change is 0. 

When the total allocations, open-ended and fixed-pie, 
are approximately equal, say, Ap = A' = AP, the changes be- 
tween open-ended and fixed-pie should be compared. The 
following quotation will guide the analysis [7, pp. 106- 

We have compared the distribution of $1 billion among 
the States on the basis of the census counts and the dis- 
tribution of $1 billion among the States on the basis of 
three sets of corrected population figures (table Vlll-A). 
The illustrative comparison shows that the size and varia- 
tion of the percentage shifts in the funds apportioned 
among the States on the basis of the corrected population, 
as compared with the distribution on the basis of the 
census population, are much smaller than the size and 
variation of the rates of underenumeration (that is, per- 
centage shifts in population among the States) used to 
correct the population. 

In general, simple apportionment formulas dampen con- 
siderably the effect of any variable adjustment of a set 
of data. 

The analysis places these remarks in a formal framework. 
Write the final expression in (30) in the form: 



IT - TT 
1 +7T 


Here, p f . and ir j have been replaced by the random variables 
p* and n*. Not using the subscript / indicates that one State 
is much like another from the viewpoint of this analysis of 
errors in the counts and in the modification process. The 
unsubscripted variables without *'s refer to the national 
value while * refers to a State. 

The following plausible assumptions will be used: 

1. p* and 77* are independent. 

2. ££.*= 2 

?lY< . 

3. Ett* = n (=0.025), and 7r* is approximately normal with 
variance o 2 . Since 7r* never exceeds 0.1, a 2 < 0.01 
and 0.0001 appears as a typical value for a 2 . 

Under these assumptions, 

EA'P*\ll_IL\ = /T <££_*) (.8) _£_ <A'(A) 2 (32) 

^ I 1 +7T | p 1 +77 ~ 

Now, with the same assumptions, consider (26) in the form: 

so that 

Ap* tt' 

E Ap* TT* = A(E P *) TT 



The ratio of the expected change with open ended to the ex- 
pected absolute change with fixed pie is: 


A' &£-) (.8) -2— 

P 1 +TT 


Assume A' = AP, or equivalently, A' = AP (1 + 7r). Thus, 
(35) becomes 


(.8) a 


(In these computations, no distortion of practical results 
will occur due to the use of Ett* instead of E \tt*\.) 

The fraction in (36) is unbounded above, since a = is 
a possible modification process. Values of 2 or 3 for (36) 
seem likely. The (36) ratio does give a quantitative expres- 
sion for some of the above quoted remarks of SPRR. The 
minimum value of the fraction is 





when all the n-'s = 0, with the exception that one of them 
equals mr, and the probability fiat a particular State is 
selected is -1/n. This is an extreme case, but it indicates the 
SPRR conclusion that size of shift is not universal. The 
condition A' = Ap (1 + tt) results in equal total allocations 
by both processes. 

Next, we compare the variances of the shifts in allocations. 
From (26), for the open-ended process, the relevant random 
variable is Ap* n* , and from (30), for the fixed-pie process, 
the relevant random variable is Ap* (7r*-7r). We first compute 

-^ j[\/ (Ap*n*)] - V[Ap*(n*-n)]\= E(p*n* - Ep*n*) 2 
- E [(p*n*~ Ep*7i*) -(p*7i-nEp*)] 2 

= 2n E{p*n* - Ep*n*) (p* - Ep*) - n 2 V(p*) (38) 

If p* and n* are independent, this reduces to -tt 2 V{p*). 
In this case, 

V(Ap*n*) - V[Ap* (tt*- tt)] 

[E(A P *tt*)] 

*_*\1 2 



Further, the difference (38) could be negative when the cor- 
relation between p*n* and p* is small positive or negative. 
This is in disagreement with SPRR; the dampening effect 
of fixed pie is less than open ended. 

In thinking of the political interpretation of modification, 
it is natural to compare States. A simple measure of this com- 
parison is 

(change in allocationX /change in allocation\ 
original allocation /■ \ original allocation /■ 


When multiplied by 100, this is the difference in percentage 
change in allocation between the States. If this formula is 
computed for the open-ended allocation, one obtains 

AA i AA j A i A j 

A(w i + b.) A(wj + bj) Wj + bj wj + bj n i n j (40) 

And if (39) is computed for the fixed-pie allocation, one 

A' '[ > ) A'_±[J—\ 
p\\+irj p\1+tt/ 


TT:- TT 


1 +7T 

TT:- TT 

1 +TT 


1 +7T 


Thus, the comparisons between States of the percentage 

modification in allocations are practically the same for open- 
end and fixed-pie methods. If politicians look at the problem 
in terms of relative advantage, the dampening effect from 
fixed pie versus open end will be negligible. 

The real advantage of fixed pie appears to stem from it 
sometimes yielding smaller changes. If comparisons between 
States are not made because the changes appear small, then 
the fixed-pie allocation will cool the political forces. Actually, 
much of the discussion of modification of allocation centers 
on equity. It is not entirely clear what "equity" means, but 
most usage of this term will involve the kinds of comparisons 
suggested above. 

The strongest apparent argument for data modification 
is that we are sure the census data contain errors. Large 
sums of money are allocated with the use of census data. 
Since the data have errors, the allocations have errors. Hence 
we should remove the errors. Some comments on this argu- 
ment are: 

1. It is not obvious that we can much improve on the 
census data [7, p. 2] . 

2. If the counts are not corrected, those communities that 
have done their civic duties well are at an advantage. 
Not correcting could encourage people to be counted 
(twice!); correcting could decrease the incentive to be 
counted. The Constitution might intend to use this 
device to help the census, see comments in [2] . 

3. It is likely that different counts will be used for differ- 
ent purposes, such as allocations and apportionment. 
This could result in loss of confidence in the Census 
Bureau for all activities. 

4. Any corrections made would not be perfect, and there 
is no consensus on which distribution of errors would 
be most equitable [7, p. 107] . 


The separation of demography and statistics is well recog- 
nized. A conference was held at the beginning of the 1970's 
at the East-West Center of the University of Hawaii to ex- 
plore the reasons for this lack of interaction. There is evi- 
dence that the conference had little effect. SPRR makes 
limited use of statistical reasoning. Probabilistic concepts 
are not used either to describe "errors" or to assess the 
quality of "estimates." The SPRR analysis has two specific 
aspects that are particularly puzzling to me; I am not sure 
whether the puzzle arises in my role as a statistician or as a 
general observer. To proceed, a few comments will be made 
on the subjects of this paragraph and then more specific 
comments will be made on the methodology. 

Begin with a puzzle. Since the 1970 census contains 
errors, if we want to improve on the census, our modifica- 
tions should (not) depend on the counts in the 1970 census. 
SPRR [7, p. 3] insists on "should not"; in chapter VII I, words 


like "correct" and "adjustment" mean "replace the census 
count with an independent estimate." The SPR R "estimates" 
are formed without using the 1970 counts. SPRR would 
prefer no use of census data, but the census was the only 
useful source of migration data. 

I think every effort should have been made to use the 
1970 counts; 8 otherwise one throws the baby out with the 
wash. When one has a measurement process that contains 
random errors and biases, the usual practice is to calibrate 
the process with alternative information; only in extreme 
cases do you throw out the original process. (It is important 
to recall that the vast bulk of SPRR is devoted to assessing 
the errors in the census; SPRR is not concerned with "ad- 
justments" of the census.) 

The second puzzle, closely related to the first, is: There 
is (not) a way to calibrate the current (1970) census counts. 
SPRR insists on "is not." (See [7], p. 3, "Ultimate untesta- 
bility"; p. 9, right-hand column, "indirect"; p. 20, "cannot 
be answered"; p. 24, "arbitrarily"; p. 26, "evidence cannot be 
found"; p. 40, "metaphysical quality"; pp. 40 and 83, "un- 
testable assumptions"; p. 94, "lacking any formal empirical 
basis"; and p. 104, left-hand column, paragraph 2, "no-way.") 
Part of this negativism corresponds to judgmental decisions 
that occurred in a very complex analysis. The general result 
of the SPRR analysis is that the authors are working on a 
problem where predictions-hypotheses cannot be tested. 
They have a battery of internal or indirect checks on the 
quality of their estimates [7, p. 9, right-hand column] . 
They have put their developmental effort into estimation of 
census "error," but they did not address the problem of ob- 
taining a tested method of adjusting the census. 

SPRR presumably felt their analysis must precede the cali- 
bration problem. They might feel that the calibration prob- 
lem is meaningless or impossible. If one is completely nega- 
tive about the regularity of (social) nature, there can be no 
science. (SPRR is not extreme in its negativism; it makes 
much use of life tables.) 

SPRR has considerable statistical strength. Results regard- 
ing samples are well handled. It does not, however, have a 
unifying concept of "error," as in subjective statistics. A 
frequent SPRR interpretation of "error" is a discrepancy 
between two proposed values for the same numerical concept. 
An analogue for the statistical distribution of errors is the 
collection of "errors" arising from all of the proposed values 
for the same concept. Although the analogy is somewhat ex- 
ploited (for example, ranges of "errors" are considered), 
it does not receive an overt development. It is my impres- 
sion that this treatment of "error" will not generate a satis- 
factory methodology. One sees the authors of SPRR attempt- 

8 No doubt this is a Catch-22 criticism of SPRR. If SPRR had 
some interest in giving methods to be used for census modification, 
then I insist on the argument. If SPRR were just developing tech- 
niques to explore the coverage rate, then I would think they are too 
late with too little. 

ing to describe the intervals they have found for the omis- 
sion rates (pp. 91-92). The vocabulary is informal but 
strives to be quantitative: "acceptable," "good," "ade- 
quate," "not a wholly adequate," "inadequate," "too 
broad," intervals. 

Alone, I cannot argue the following position in detail or 
with great success. But if I were now to be involved in a 
substantial effort to modify the census, my basic approach 
would be to think of it as a problem in statistical calibration; 
I would try to use the current census data, I would try to 
have a broad and unified concept of error, and I would strive 
for testable predictions. 


In demographic analysis, one obtains several different 
methods to estimate a quantity. If the estimates differ among 
themselves, then the data used or the logic behind the esti- 
mates are not perfect. The following applies to those situa- 
tions where one can demonstrate the source of differences 
is the data and not faulty logic. If one of the methods is a 
standard or is known to yield results close to the quantity 
being estimated, then one can actually pinpoint which data 
sets are in error and the sizes of the errors. 

As a special case of demographic analysis, consider the 
number of white people under age 35 on April 1, 1970, who 
should have been counted in the U.S. census. An estimate of 
this quantity would be one of the published census results. 
Another method of estimation is to use demographic logic, 
that is, use the relevant numbers of people who were (1) 
born in the United States, (2) died in the United States, 
(3) immigrated to the United States, and (4) emigrated from 
the United States. These figures are combined— (1)- (2)+ (3) 
- (4)— to give the demographic estimate of the population 
size. Of course, none of these quantities, (1), (2), (3), (4), 
are known exactly. From other sources, it is known that 
(1) and (2), as reported, are too small, and hence they are 
"corrected." The properties of (3) and (4) are not well 
known. Nevertheless, the demographic estimate in this case 
is considered to be much closer to the true population size 
than the census count. Hence, one minus the census count, 
divided by the demographic estimate is the omission rate for 
the white population under 35 in 1970. This near ideal of 
demographic analysis involves a substantial amount of work. 
Other applications have additional complications, and it is 
not always clear where the errors arise nor how large they 
are. The task of demographic analysis is like putting a jigsaw 
puzzle together when the pieces have been worn and the 
picture has no border. From the final puzzle, the size of the 
original picture will not be clear. 

In the above example, the two estimates are based on en- 
tirely different data and most of the error is ascribed to one 
data source. Insofar as two estimates use the same data (in 
the same manner), one does not obtain a simple view of the 
error in the common data. 


In this simple example, the obvious 9 estimate of the 
national count is the demographic estimate, which then sug- 
gests for this purpose the census is not useful! When the 
demographic problem becomes more complex, 10 it might 
be useful to use some census data— but not the population 
counts— in forming the estimates. For example, to estimate 
the number of white persons in each State under age 35, it 
is useful to have information on between-State migration; 
such data are available from the census question on "State 
of birth." The base of demographic analysis is bookkeeping 
or conservation of humans overtime. Nonbookkeeping tech- 
niques include the use of life tables or sex ratios obtained 
from one population applied to other populations. Census 
counts are used in some parts of demographic analysis; for 
example, if the size of a cohort increases between census 
(without net migration), one has a good idea of the location 
of troubles. Examples of this kind occur for the black 
working-age male. 

Demographic analysis does not emphasize (1) use of non- 
demographic data, such as macroeconomic or physical data- 
number of cars, etc.— or social data— number of imputations; 
(2) use of relationships other than conservation (regression 
analysis is excluded); and (3) predictions and hypotheses 
that can be tested. The artificial exclusion of data and stan- 
dard techniques discourages confidence in demographic 
analysis, either for the purpose of locating errors or modify- 
ing census counts. The lack of testable predictions takes one 
out of the realm of science. (The demographic analysis has 
built-in attitudes that require reasonable results, such as ad- 
joining States should have similar results and derived sex 
ratios should be consistent with past experience. Again, 
see [7] , p. 9, right-hand column.) 


If the same population segment is examined several times 
(census, birth certificates, social security, etc.), and if the 
records for individuals are matched where possible, then the 
numbers of nonmatched records can be used to estimate 
omission rates. Although this procedure is occasionally used, 
its cost 1 1 prohibits extensive use as would be required to 
obtain estimates of State and sub-State omission rates. 
Because the procedures used to obtain large sets of data are 
very complex, the matching method cannot be applied in a 
simpleminded manner. For example, imputed census records 
cannot be matched to other records. Matching theory is 

straightforward in the unusual situation where the causes 
for omission are statistically independent in the data sets. 
The criticisms of demographic analysis apply to matching. 
SPRR uses a composite method where results from demo- 
graphic analysis and matching are combined by taking 


Once useful estimates have been obtained for omission 
rates at some geographic level, say, States, one could apply 
those rates to smaller geographic areas. For example, if State 
omission rates by race, co- and B-, are available, then one 
could compute 

VV 1+co / )+ V 1+/ V 

where P- is the estimated population and {w~, b--) are the 
census racial counts for the/th region in State /. This method 
is an obvious exploitation of the demographic results (co-, |3-). 
The exploitation has serious appeal: (1) It makes direct use 
of census counts— (w-, b •■). (2) The method makes many 
predictions; much use of the (to-, fy). (3) Some of the predic- 
tions can be checked, since the P- are relatively small com- 
pared to the Pj } 2 It is not uncommon to have recounts 
In 1977, there were 258 special censuses (see [8]). 13 Also, 
careful matching of records for relatively small popula- 
tions can be used. These checks are expensive and im- 
perfect. But it is essential to have some empirical base to 
check the (demographic) analysis. At least there is an indica- 
tion of a possible empirical verification of the analysis. Other 
methods of verification should be sought. 

If, as a result of these checks, the predictions appear in- 
adequate, other models could be fitted. For example, [6] 
introduced a>^, to H , p^, and j3^, where (to^, p^) are omis- 
sion rates (national) for low-income people and {co^, j3^), 
the corresponding rates for people at other income levels. 
These rates must satisfy 


H/(1 + to) = W. (1 + CO, ) + Wu (1 + COu) 

b(1+P)=b L (1 +P L )+b H tf+p H ) 



where {w,b) are the total counts of the races and (w, =w-Wu 

'This would be the case if the only possibitities were the demo- 
graphic analysis and the census count, but these are not the only 

10 The demographic analysis of the Hispanic population appears 
very complex; at least, it brings many new problems. (See [9] .) 

1 ' A strategy that needs exploration is the expenditure of large 
sums, on the order of $100 million, to help modify census counts 
and prepare intercensal estimates. See Roberts' comment in [2]. 

1 2 Some of the following material is anticipated by (6, p. 13] and 
[7, pp. 4-5]. 

13 The implicit suggestion being made here is that many of the 
activities of the special census could be used to help in the modifica- 
tion. Because of time of occurrence and procedures used, recounts 
and special censuses are not directly comparable to the decennial 
census. Nevertheless, with careful coordination and modest pro- 
cedural changes, such activities as recounting and special censuses 
might play an important role in the analytical activities of the Bureau 
of the Census. 


and b^=b~b^) are the total numbers of poor of each race. 
Then estimate P- , using obvious notation, by 

Pj=w iL (1+gjJ + w iH [Uco H ) + b jL (1+/JJ + b iH (1+/^) 


The State counts by race and economic level are available 
from the census. Siegel [6] was led to this model in an 
effort to explore the consequences of various assump- 
tions; doing a sensitivity study, he "arbitrarily" chose co, 
= 2(x> H and B L = 2j3^. At the sub-State level we might need 
quantities such as (u> jL , co jH , B jL , B jhj ). As a result of the 
checks on the simpler model (co-, B.), there should be some 
empirical evidence to help estimate co., and B-. . 

Other models can be explored empirically, such as 
P.. = W .j (1 + W/ ) + b.. (1 + ft.) + £ y ik (x ijk - Xj mk ) (45) 

where x- k is the /rth economic or physical or social variable 
associated with the /th region in the /th State. Such models 
are very general so that substantial consideration is required 
to select those models worthy of study. Notice, the x-, can 
be functions of population counts. 

The above exploitations 14 of the output of the demo- 
graphic analysis appear to place that analysis as a sensible 
part of the estimation problem. There is an opportunity 5 
for empirical verification and a coherent statistical analysis. 
A result of such analysis could be to arrive at a simpler 
model, such as 

P.. = Wjj (1 + co) + b.. (1 +/» + Ji y k (x J!k -Xj . k ) 


At this point, we can ask why a substantial effort was 
made to find (co-, B-) in preference to the available (co,B). 
There are acceptable (to Census) values for co and B, cor- 
rected for sex and broad age groups. Use of these corrections 
does not appreciably change the values of the {PA, and the 
use of co, and B, (as above) was inconsequential, (See 
[6], p. 11.) Since SPRR was "developmental," the work 
might have been done as an exercise in sensitivity analysis. 
A possible motivation for the work was empirical evidence 
that there was substantial variation among the {co-} or 

{B f }. Reference 6 summarizes the empirical evidence regard- 
ing differential undercount. The table in the right-hand 
column of p. 6 indicates there is a regional effect as well as 
a racial effect. Apparently, a set of State estimates has been 
prepared with this evidence. (See [4] , p. 6.) An explicit 
argument from the empirical evidence to the plausible 
underenumeration rates has not been made. 

Let the populations generated for the States by applica- 
tion of the national values of (co,B) be called basic. The 
sensitivity studies suggested by this evidence include the 
following results: 

1 . Inclusion of information on age and sex does not cause 
substantial change from basic. 

2. Assuming the omission rate for low income is twice 
that for high income does not cause a substantial 
change from basic. 

3. Making omission rates proportional to income-the 
precise procedure not being explained-yields more 
variability and more large values for the State omis- 
sion rates than basic. 

4. With "education" replacing "income," results are same 
as 3 above. 

Although the sensitivity study shows plausible results, 
the evidence is not convincing that any of the used assump- 
tions are better than basic. (The alternative computations 
were done primarily to illustrate the consequences of varia- 
tions in omission rates for population segments, but pre- 
sumably the cases were picked because of their plausibility.) 

Basic is not appealing to some because it does not create 
very much variability between the States: General impres- 
sions, continued interest, and sometimes heated discussion 
suggests there should be big differences. Thus, part of the 
motivation to search for (to-, 3-) in preference to (co,B) 
is to find an anticipated substantial variation in the tt .. 

Let it* be generated from the basic synthetic method, that 

■n = 

13b* + cow' 
b* + w* 


Let 7r** be generated by a synthetic method, where the 
B. and co ■ need not be constant. So 

B*b* + co*b* 


+ W 


14 The demographic analysis supplies a part— possibly major— of 
the total modification. 

l5 The methodology requires the data for verification. Demo- 
graphic analysis can get along without it as far as it can go. Having 
the need for data to apply the statistical methods can help obtain the 
data. Unless such data are created, the results of every modification 
procedure cannot be assessed. If such data are not to be obtained, 
one must consider abandoning the idea of modifying census data. 

Assume, as usual, 

p „** - p „* - „- tow + Bb 
t n = t it - n £— 

w + b 


and the (b*, w*) is the same random variable in (46) and 


(47). Now compute 

V(Tr**)=E(lT**-TT* + TT*-TT) 2 

= V(n*) + 2C(tt* - 77, n** - n*) + V(n ** - 77*) (49) 

A sufficient condition for 77** to have larger variance than 
77* is to have C(7r* - 77, 77** - 77*) > (and 77* * 77**). This 
will happen when 77* —the basic modification— and 77** - 77*— 
the additional modification— are uncorrected; if the addi- 
tional modification is "noise," this will be the case. For ex- 
ample, assume 


/T=j3 + 5 

oj =to + e 

Thus, for fixed m, and increasing m 2 , the. ratio of b/a must 
be fixed and the ratio of b/a + 1 must decline. That is, b 
must decline, which forces more probability into the right 
tail. Consider the case m x = 0.025, which implies b = 39a. 
For a > 1, the density has one mode (a- Ma+b-2). For 
a < 1 , a mode appears at the right. 

Thus, preparation of (fy ,03.) instead of the use of (0,gj) 
to generate (77^) will tend to increase the variability and the 
number of large omission rates for the States. This result will 
occur whether the job is done well or poorly. This increase 
in variability agrees with common desire, but it also needs a 
strong empirical base. 

The SPRR demographic analysis does not develop for 
individual States specific omission rates that are strongly 
favored as being correct or better than the synthetic esti- 
mates. The analysis works with estimates arising from a vari- 
ety of plausible assumptions. As a collection, these estimates 
appeal to SPRR because: 

where £5* = E e* = and (5*, e*) is independent of (b*,w*). 

/pb* + uw*\ /8*b* + e*w*\ 

C(n*,ir**-TT*) = E[Z- — — =0 (50) 

\b* + w* ) \ b* + w* I 

Thus, any modification beyond basic, which is independent 
of (b*, w*), will increase the variance, that is V(n**)> V(n*). 
In fact, for this model. 

J5*b* + e*w\ 

V(n**) = V(tt*) + E\ f 

\ b* + w* I 


In summary, one would anticipate 1/(77**) > 1/(77*) in many 
situations; the occurrence of the inequality is not suggestive 
that the modification was particularly worthy. 

It also should be noticed that if E 77** = E 77* = 77 and 
V(-n**) > V(77*), then 77** will tend to have more large 
observations than 77*-assume that P(tt**<0) =P(77*<0) =0. 
As an example, assume X = X(m l ,m 2 ) is a Beta random 
variable with first two moments mx,m 2 . Then the density 


a- 1 /-, „\b- 1 


na)r(6)* d ' d-x) 

a - 1 

AT?! -<— 

a+b 1 + b/a 



1. The variability between States, with the estimates pro- 
posed by SPRR, is substantially larger than the vari- 
ability from basic synthetic assumptions. 

2. In spite of 1 (above), the SPRR series does not gener- 
ate any outrageously large omission rates. (Actually, 
the SPRR series appears to me to produce some 
excessively small omission rates— including a few (<3) 
negative rates.) 

3. The range of omission rates for a State generated by 
the different assumptions is modest. 

4. The correlation between omission rates for the States 
for some pairs of sets of assumptions is high (<0.9). 
(The SPRR and synthetic omission rates have a very 
low correlation (SO. 2).) 

5. The geographical pattern of SPRR omission rates is 
consistent with other data. 

Although the SPRR State omission rates do not indi- 
vidually have strong appeal, they do seem appropriate, 
along with the synthetic estimates, as useful quantities to 
help in the estimation of subnational, possibly sub-State, 
population counts. 


The following remarks on SPRR methodology will help 
indicate why their estimates of omission rates are not in- 
dividually favored. This analysis will concentrate on the 
white population under 35 years of age; this is the popula- 
tion segment where the SPRR techniques work best. 


Insufficient Reason 

m : 



(a+b+1) (a+b) = 

1 + 

a + 1 


"Since empirical evidence cannot be found to support a 
particular weighting scheme, the most practical and techni- 
cally defensible approach is to use a scheme in which the 


two limiting values are given equal weight" (SPRR, p. 26). 
This is certainly "practical" in the sense of easy, but there 
are even easier things. The "technically defensible" aspect is 
obscure. If one of the "limiting values is too extreme, the 
average is a very poor guide. Further, "inasmuch as the biases 
went in opposing directions and their relative magnitudes 
could not be determined, the two sets of State-of-birth 
estimates were averaged with equal weights... " (p. 26). 
SPRR presents these remarks in a harsh and unthinking 
manner, but the reader suspects that much thought and 
discussion went into these decisions. ". . . in the absence of 
evidence regarding the State-of-birth distribution of the 
nonresponses, they were assigned according to two limiting 
assumptions... " (p. 26). In this case, SPRR has clearly 
given much attention to the problem; see "Data Quality," 


This essay is not a review of SPRR; I have not indi- 
cated what SPRR has done. The analysis involves many 
definitions, variables, and decisions. The authors are in 
a "developmental" stage rather than production. They 
hint at each path they started to follow. The steps are com- 
plex, so that one is not encouraged to derive statistical pro- 
perties of the end product from assumptions about the begin- 
ning. Reported high correlation between two series of State 
omission rates might be evidence that one of the series is 
constructed to be a near linear function of the other or that 
presumed statistically independent sources of information 
are not independent. In SPRR, it is not easy to see which, 
if either, is correct. 

Data Quality 

Apparently a major source of variation between esti- 
mated State omission rates is the poor quality of the 
1970 census data on the question regarding State of 
birth. For whites under 35 years old, about 4 percent of the 
data are missing; a substantial amount of the obtained re- 
sponses must be wrong (table ll-A). SPRR does much ex- 
ploratory work that indicates the missing data are not a 
serious problem. The response errors arise from a variety 
of reasons. The correct response is the State of residence of 
the mother when the child is born. Apparently people often 
give the location of the hospital where the birth occurred 
or the current State of residence. SPRR does two basic 
analyses: (1) Treats the census response as where birth 
occurred (table ll-A, col. 3). (2) Treats the census response 
as residence at time of birth (table ll-A, col. 1). The mean ab- 
solute difference between the generated omission rates is 
about 1 percent and the average omission rate is about 2 per- 
cent. (It should be noted that the 1 percent does not include 
the District of Columbia, 78 percent; Maryland, 13 percent; 
and Virginia, 6 percent.) SPRR takes the average of these 

two columns as the basic estimate of omission rates. Not 
only is the use of "insufficient reason" discomforting, but 
we are in the position of accepting the average of two bad 
observations as being an improvement. We are not sure the 
truth lies between the limits. The two rates for Texas, 4.5 
and 4.3 percent, are close together, and they seem too high. 
All of this is to say that a major source of variation in 
the SPRR omission rates is bad data, which perhaps should 
not have been used. Of course, not to use the data on State 
of birth is a great loss because they are the migration data. 
Without them, demographic analysis apparently cannot 
begin; perhaps they will be better in the 1980 census. 16 


Three stochastic models of the counting and modification 
process are given. These illustrate the possibility of con- 
sistency and some types of assumptions that might be of use 
in practice. 

Model 1. As a part of a Monte Carlo study, one could 
generate a stochastic model after the data are collected and 
the rates, B- and to., are determined by some method. In 
particular, the Monte Carlo model could use 


P(B* = 8.) = ^L 
' K n 

1 I 


P(u> f * = 03 k ) = 


2 iv- 

1 ' 

for each / and k from 1 to n. This would not be the com- 
plete model, but it is enough for some analysis. Thus, from 


Etf-rLLL =b 


Z b- 
/=1 ' 


r- * 

E 03; =03 

For the 1970 census, it is assumed that |3 = 0.077 and 
co = 0.019. 

16 The lack of good internal migration data is thought to be a 
real shortcoming in the U.S. data base. If arguments were made 
to demonstrate the usefulness of such data, perhaps its collection 
could be justified. 


Model 2. In model 1, the omission rates were assigned at 
random with probability proportional to size of the racial 
count in the State. Another framework for a Monte Carlo 
experiment is to introduce (b* , w*) as the counts of the 
races for a randomly selected State. 17 The assumed device 
for generating the random pair (b*, w*) is 


o.and w 

IV;) =— for /=1 to n 

1 P 

That is, the probability of selecting the State /data is propor- 
tional to the count of State / . Also, (|3. ,to f ) have been com- 
puted by some method for each /. Define (]3*, cj*) as the 
omission rates associated with (6*, w*). Now assume that 
(b*, w*) and (/?*, co*) are independent; size and omission 
rate are independent. Further, assume Fj3*=j3 and £co*=to; 
that is, the model satisfies a consistency condition like that 
in model 1 . Then, with 

. b*B* + iv*cj 

7T = ■ 

b* + iv 



the following consistency result is obtained 

En* = E 

b*B* + w*co< 
b* + w* 

E E 

b*p* + w*u>* 

b* + w* 

(b*, iv*) 

b B + w co bB + wco 

= E — Sr s- =—, = ^ 

b + iv 6 + iv 


A consequence of assumptions (1) and (3)— below (31) 
in section 3— is 

P* P* 
EL- =EtL 

P P 

The expected proportion of the count in a State is the same 
before and after modification. The relationship is derived 

E T = £ p*(1+ir*) = E P^ E 1_Mr_* = £ p_* 

P p (1 +7T) p 1 +7T p 

Of course, the modification process will change the popula- 
tion of each State. At the model level, modification designed 
to remove bias does not make a change in the expected 
proportion of the population in a State. At the practical 
level, even with this model, modification might be desirable. 
If the process is effective, modification will give allocations 
closer to congressional intent than would be obtained with 
the original counts. (Here, we are thinking of typical alloca- 
tion methods that depend on population proportions.) Con- 
sequently, the process of modification increases equity. In 
this model, the bias is random, as the effects are random in 
a random-effects model. 


Model 3. Now modify model 2 so that the sizes and omis- 
sion rates are dependent. In particular, assume 

P(B* =j3-and to* = 03. I b* = b. and iv* = wj) = 1 

which implies 

b ■ + w ■ 
P(B* = B i ™do>*^ i )= I ^-^ r p/P 


En* = E 

b*B* + w*u* = p/l/ + w /"/1f b i +w i 
b* + w* [ bj+Wj llzfo.+ wJ 

= 2 

b£.+ iv.-cj . 
ri i i 

E(bj + w. ) j 

= 7T 

Again, a desired consistency result is obtained. 

1 'Although no subscript is attached to the random variables b* 
and w* , the context should make it clear that they relate to the States 
and not to the national total. If it were necessary to talk of two 
States, a notation such as (b* , w*) and (b'*, w'*) could be used. For 
the model about to be described these vectors would be highly de- 

1. Keely, Charles B. "Counting the Uncountable: Esti- 
mates of Undocumented Aliens in the United States." 
Population and Development Review, Vol. 3, No. 4 
(1977), 473-481. 

2. Keyfitz, Nathan. "Information and Allocation: Two 
Uses of the 1980 Census," with comments by Harold 
Nisselson and Harry V. Roberts. 777e American Sta- 
tistician, Vol. 33, No. 2 (1979), 45-46. (The Bureau 
of the Census, during September 5-8, 1979, held a 
workshop directly related to proposals suggested by this 

3. Lindley, D. V., Tversky, A., and Brown, R. V. "On the 
Reconciliation of Probability Assessments." Journal of 
the Royal Statistical Society, Series A, 142 (1979), 
146-180 (with discussion). 

4. National Research Council. Counting the People in 
1980: An Appraisal of Census Plans. National Academy 
of Sciences. Washington, D.C., 1 978. 

5. U.S. Department of Commerce, Bureau of the Census. 
"Estimates of Coverage of Population by Sex, Race, 
and 'Age: Demographic Analysis." Census of Population 
and Housing: 1970, Evaluation and Research Program 
PHC(E)-4. Washington, D.C.: U.S. Government Printing 
Office, 1973. 

6. . "Coverage of Population in the 1970 Census 


and Some Implications for Public Programs." Current 
Population Reports, Series P-23, No. 56. Washington, 
D.C.: U.S. Government Printing Office, 1975. 

7. "Developmental Estimates of the Coverage of 

the Population of States in the 1970 Census: Demo- 
graphic Analysis," by Jacob S. Siegel, Jeffrey S. Passel, 
Norfleet W. Rives, Jr., and J. Gregory Robinson. Current 
Population Reports, Series P-23, No. 65. Washington, 
D.C.: U.S. Government Printing Office, 1977. 

8. . "Summary of Special Censuses Conducted by 

the Bureau of the Census between Jan. 1, 1977, and 
June 30, 1977," and "Summary of Special Censuses 


Conducted by the Bureau of the Census between July 1, 
1977, and December 31, 1977 ." Current Population Re- 
ports, Series P-28, Nos. 1557 and 1558, respectively. 
Washington, D.C.: U.S. Government Printing, Office, 1968. 

. "Coverage of the Hispanic Population of the 

United States in the 1977 Census." Current Population 
Reports, Series P-23, No. 82, Washington, D.C.: U.S. 
Government Printing Office, 1979. 
10. U.S. Department of Commerce, Office of Federal Sta- 
tistical Policy and Standards. "Report on Statistics for 
Allocation of Funds." Statistical Policy, Working Paper 
1, 1978. 


William G. Madow 


Estimates of undercoverage of the national population 
have been made using demographic analysis for the 1950, 
1960, and 1970 censuses of population. 

The need for estimates of population of subnational 
areas in the United States adjusted for undercoverage has 
grown considerably since the 1970 census because of the 
many Federal programs allocating funds on the basis of pop- 
ulation and other census data. 

In 1977, the Bureau of the Census issued a report, re- 
ferred to as SPRR in Savage's paper and in these comments, 
which presented "developmental" estimates, also made by 
demographic analysis, of undercoverage by States. 

Savage's paper consists primarily of a critical review of 
making estimates by demographic analysis, as in SPRR, 
especially when the objective is to modify undercounts, 
and a discussion of why he believes it preferable to use sta- 
tistical methods for these purposes, including Bayesian 
methods. My basic comment on Savage's paper is that, 
although his discussion of SPRR includes a discussion of the 
feasibility of making adequate subnational estimates by 
demographic analysis, it does not include a discussion of the 
feasibility of making such estimates by statistical methods. 
This comment is not necessarily a criticism. I believe Savage 
wishes both to avoid having demographic analysis accepted 
prematurely as the preferred method for making subnational 
estimates merely because such estimates were adopted at 
the national level for the censuses of 1950, 1960, and 1970, 
and to persuade the decisionmakers to make a thorough and 
expensive effort at developing estimates by statistical 
methods intended to calibrate the census counts rather than 
replace them as in demographic analysis. 

The conference indicates that the Bureau of the Census 
is giving careful consideration to both demographic and 
statistical methods of making subnational adjusted estima- 
tions of the 1980 census. SPRR does seem to be a greater 
effort in the development of the demographic analysis than 
has at least so far been visible for statistical approaches. 

Let us turn to some more specific comments on Savage's 
paper. Demographic analysis uses data from outside the 
current census in an attempt to approximate error-free 
census values. Different data are used for approximations to 
different parts of the tables of population by age, sex, and 
race. When an algebraic expression exists, e.g., 

Population = Births- Deaths + Immigration- Emigration 

for a reasonable past time period, then accuracy of the popu- 

lation estimate is acheived if each of the four terms on the 
right is accurate. However, when one or more of the terms on 
the right becomes inaccurate, e.g., immigration, then the 
estimate becomes statistical in the sense that the population 
estimate may be subject to errors depending on the errors 
of the component variables. One method of examining the 
error structure of any random variable that is a function of 
other (component) random variables is to determine the 
variability of the function induced by assigning sets of possi- 
ble values to the component variables. This is done in SPRR. 
One or more probability distributions of these component 
random variables may be assumed based on knowledge, 
judgment, experience, or anything else. Whether or nof those 
doing demographic analysis make this step, it is available to 
them. Thus I see no reason to contrast demographic and sta- 
tistical analysis; it is only that one of these methods (demo- 
graphic analysis) is sometimes in the position where what 
it is desired to estimate can, with sufficient insight, be ex- 
pressed in terms of essentially error-free component vari- 
ables, i.e., the estimate is made by deterministic methods. 
(Clearly, I mean these statements to be taken relatively, not 
absolutely.) For this reason, Savage's assertion "Estimates 
of population sizes from demographic analysis would be hard 
to accept" is much too strong. 

As SPRR shows, in dealing with subnational estimates, the 
component variables are unlikely to be error free and thus a 
major reason for preferring demographic analysis is' lost. The 
sizes of error must be taken into account. Also, in 1980, 
immigration would be more in error than in earlier censuses. 

Savage feels that, for problems of undercount, demo- 
graphic analysis has weaknesses because it does not permit 
the testing of hypotheses and yields estimates of under- 
count using data from outside the population census itself. 
These are not serious problems for estimating the undercount 
and adjusting for it if the outside data are accurate. The 
undercount results in biases in important totals rather than 
increased variances; even if at some point (e.g., in synthetic 
estimates) those not counted are treated as though they 
are "missing at random," higher level or marginal totals 
would still be biased, if not adjusted, perhaps to totals ob- 
tained by demographic analysis. To test hypotheses that 
biases are equal implies that unbiased or very good estimates 
of the biases are available. This places requirements on PES 
and matching studies that are unlikely to be satisfied. Thus 
data may not be available for a satisfactory statistical test of 
hypotheses by statistical analysis. It is not at all unusual to 
use "benchmarks" obtained outside a survey to improve 


estimates of totals. What matters, if one is using biases and 
variances as indicators of error, is which alternative estima- 
tion procedure produces the smaller bias and variance. If 
good enough PES and matching studies could be made, 
then good estimates of bias or relative bias could be obtained 
for smaller areas than those for which demographic analysis 
may be expected to provide good estimates of bias. Real- 
istically, achieving adequate quality of PES and matching 
studies for good estimates of bias, especially for smaller 
areas, seems ulikely. 

Thus, while I may share Savage's views of what statistical 
models might accomplish under more ideal conditions than 
those we now face, it seems unrealistic to make choices on 
the bases of the images of what statistical methods might 
accomplish rather than intensive studies of the alternative 
approaches. I do want to express my agreement, however, 
with Savage in his not explicitly expressed belief that for 
subnational estimates, until more evidence is in, demographic 
analysis should not have a preferred position; it is one of 
several alternatives that must be studied. 


The Census Bureau provided some background to the dis- 
cussion by relating the imputation of persons in the 1970 
census to the undercount issue. It was noted that approxi- 
mately 5 million persons were added to what might be called 
the direct interview population— those who were counted on 
the basis of household responses. The information and basis 
by which several million people were added arose from 
census operations— a check on units designated by enumera- 
tors as being vacant and a check by the post office of housing 
units listed in certain areas of the South. A few million more 
persons were added for mechanical reasons. It was estab- 
lished from field counts, for example, that a given enumer- 
ation district had so many people. In tabulations in the 
processing center, far fewer persons were recorded. On the 
basis of the information, it could be inferred that records 
were lost, some people were lost, and these people were 
"made up." Persons imputed in this way were not part of 
the undercount estimated for 1970. 

More details concerning the 5 million persons that were 
added to the census counts were requested by the con- 
ference participants. There was interest in whether the 
Census Bureau might use information on unit conversions 
to impute housing units in some urban areas, or how the 
Census Bureau determined what numbers of people to 
allocate, for example. It was also suggested that such addi- 
tions were a precedent for the Census Bureau adjusting the 
counts. The Census Bureau indicated that there were about 
5 million persons in the counts for whom one would be un- 
able to obtain further information. The two programs that 
imputed with the least evidence were the national vacancy 
check and the postenumeration post office check. About 
1 million persons were imputed as a result of the national 
vacancy check. That check was implemented in the summer 
of 1970 on a national basis to try to correct for a problem 
that was noted— the enumerators were apparently classifying 
housing units as vacant that were occupied. That check 
yielded about 1 million persons, based on a sample survey 
in which the conversion rates were identified for each area 
and the computer was programmed to convert some vacant 
units— the proportion estimated by the survey— to occupied 
units and to impute the characteristics of neighboring house- 

In most areas, the census was taken by mail and for these 
areas the post office updated the mailing list. To try to be 
equitable for nonmail areas, the Census Bureau instituted 
this post office check in the South, where the undercount 
traditionally has been the highest. The Bureau gave the post 

office all of the addresses it knew about and the postal 
carriers checked these addresses and identified the ones 
they felt had been left out of the census. Original plans had 
called for processing on a 100-percent basis, but due to the 
volume and lateness, it was felt that this kind of program 
could not be implemented. Therefore, the Bureau visited a 
sample of these to see what proportion of the post office 
reports of missing addresses were left out of the census. The 
rates were found for various areas in the South, and the com- 
puter was programmed to impute this proportion of missing 
addresses in the South. 

A very large part of the 5 million imputations were refer- 
red to as mechanical errors. In processing, often things will 
happen to data, even though the questionnaires were fairly 
well filled out' by the enumerators. One of the techniques 
used to handle this problem was to go back to the address 
register, where the number of persons for each questionnaire 
was recorded and to impute that number of persons into the 
census files. Approximately 2.5 million persons were im- 
puted because of mechanical errors. 

Concern was expressed that the design of the PES has still 
not been determined, and the timeliness of the PES was 
questioned with the interview being conducted long after the 
census. Several questions were raised relative to the PES and 
three suggestions were made: (1) The PES could conceivably 
be subcontracted out if the Bureau could deputize an out- 
side agency to handle confidentiality. Thus, independently, 
the PES could be timed to coincide with the census. (2) 
Possibly monetary incentives could be used for response to 
the PES on a relatively small scale at intervals throughout 
the decade. This should differ from the CPS in that it would 
only focus on hard-core census information, which would 
help to study methods for undercount improvement and get 
a better over-all projection of current population estimates. 

The Census Bureau indicated that the PES is well designed, 
at least in terms of how the questionnaire is to be designed, 
how to collect and process the data, a timetable for process- 
ing, and most of the ideas on matching and estimation. Over 
the past 2 to 3 years, the Census Bureau has been conducting 
pretests and has noted that PES techniques seem not to 
produce adequate data to estimate the undercount. As a 
result, many things have been tried to overcomewhat appears 
to be some of the deficiencies in PES conclusions. One of 
these is that there is a correlation bias between the PES and 
the census; in other words, a person left out of the census 
also tends to be left out of PES, perhaps for some of the 
same reasons, such as deliberate concealment. 



The Census Bureau agreed that if the PES were conducted 
on the same day as the census, or shortly thereafter, some of 
the problems may be minimized. Two additional problems 
may be expected, however. The first is that if people are 
enumerated in the census and the PES too closely together, 
it is likely to bias either the census or the PES— likely to con- 
dition responses— and thus make the estimated undercount 
even more biased. The second is that the census is more 
important than the PES, therefore, the PES should not be 
allowed to affect the census counts. 

With respect to the use of monetary incentives, this ap- 
proach was studied as part of the 1972 Consumer Expendi- 
ture Survey, and the results indicated that the money is 
better spent on enumerator training. 

The Census Bureau also discussed the matching for the 
PES. The problems of using administrative records are well 
known; in doing a match of two lists using name, sex, age, 
and race, both matched and nonmatched cases may be in 
error, and this problem is magnified when one tries to match 
perfectly. The possibility for errors resulting from coding 
the sample cases also was mentioned. The Bureau would 
like to guard against developing undercount figures that are 
really describing a mismatch rate. 

The Census Bureau then questioned the conference par- 
ticipants as to their views on administrative record match- 
ing. The Bureau envisions two solutions if one were not sat- 
isfied with dual-system estimation: One is to undertake 
triple-system instead of dual-system estimation, although 
it was not felt that triple-system would yield significantly 
more information than dual-system estimation. The second 
way would be to do a second dual-system estimation using 
the PES and administrative records. 

It was indicated by some that two dual systems would 
be more feasible in reducing matching problems, but that 
there could be problems with that approach. The main 
problem is bias. The correlation bias of the PES is well 
known, but not as much is known about the bias of admini- 
strative records themselves; and the sum of the two biases 
remains uninvestigated. It was thought that, in terms of 
picking a sample estimate, it is an issue of which of the 
two biases is likely to be less. 

The point was emphasized that procedures used by the 
Census Bureau in imputation are based on the probability 
of categories of households or the probability of people 
being missed. What ought to be done with dual- or triple- 
system estimation is to estimate as best as possible what 
missed rates are for important categories of people and, if 
that can be done, the issue of the two Siegel papers could 
be resolved. That is, is it true that blacks and whites in differ- 
ent age-sex groups are missed on nationally consistent rates, 

or is it a fact that certain places where blacks happen to 
live are hard to enumerate? It was speculated that there are 
particular areas that are hard to enumerate, and blacks and 
Hispanics live there. 

The Census Bureau indicated that in triple-system estima- 
tion the information obtained from a three-way match yields 
information for seven out of eight cells representing the pop- 
ulation—the eighth cell representing those persons not on any 
of the three lists. An estimate of the eighth cell can be made 
either by taking into account the other seven cells, or by col- 
lapsing the information into three two-way tables of four 
cells each— the fourth cell of each being empty— and averag- 
ing the four estimates of the vacant cells. The difficulty with 
triple-system estimation is that while it reduces the bias, 
there is also an increase in variance. 

The Bureau also argued that while direct estimates can be 
used for sampled areas, regression or synthetic estimation 
must be used for the thousands of areas not in the sample. 
The problem with using synthetic methods is that to get 
a good fit, so many degrees of freedom may be used that the 
variance may rise rapidly. On the other hand, one could run 
a curve through the data, which is not a problem if one is 
thinking of a linear curve with eight variables; but the curves 
may be curvilinear, involving the products of three of the 
variables. Use of regression to smooth out difficulties was 
considered, but a suitable form for the regression could not 
be suggested. 

Questions then arose concerning the possible scope of the 
adjustments. It was noted that the discussion thus far had 
focused only on population adjustments. Other factors such 
as income are important factors for Federal fund distribu- 
tion, and it was speculated what assumptions would be made 
about other population characteristics if there are population 
adjustments. The Census Bureau responded that it has pub- 
lished reports showing the effects using revenue-sharing 
formulas of adjusted population only, per capita income, 
and all components— population, per capita income, and 
income. For local areas, the formula reduces virtually to an 
adjustment of per capita income squared and, except for 
certain areas constrained because of limits placed on per 
capita allocations, the adjustment eliminates any effect of 
population at the local level. Thus, in some sense, for most 
areas the adjustment for population becomes an academic 
issue when one is talking about adjusting the data in the 
revenue-sharing formula for local areas. 

While it seems counterintuitive, population adjustments 
have virtually no effect on the distributions in the general 
revenue sharing allocation system. The income adjustment 
"drowns out" the population adjustment. In addition, most 
places lose money when adjustment is made for undercount. 

Considerations II 

Diverse Adjustments for M issing Data 

Leslie Kish 

University of Michigan 


Though ours is a conference on census undercounts, it 
cannot stand in complete isolation from other kinds of miss- 
ing data. With regard both to its sources and to its remedies, 
it can and should be linked to what we know about the 
diverse types of missing data, and also to what more we need 
to know about them. 

The problems of undercounts are neither entirely similar 
to nor entirely different from other problems of missing 
data. This is worth stating because both those extremes have 
partisans who hold them with confidence, also with some 
justification. One extreme position claims that, since under- 
counts are merely one type of missing data, the justifications 
and the methods, adjustments, weightings, and imputations, 
which are used for other types of missing data, should also 
be applicable to undercounts. The other extreme holds that 
census undercounts must be seen as a completely different 
problem, because both politically and basically census data 
are and must remain the standards of comparison, hence 
attempts to use auxiliary data are wrong as well as difficult 
or futile. 

Both extremes have been ably argued in this conference. 
However, most of us, I believe, have come to hold a position 
between those extremes, for reasons noted below. And such 
compromise positions suggest that the Census Bureau should 
publish (in due time) population estimates adjusted for 
estimates of noncoverage, but also suggest that those esti- 
mates be clearly distinguished from the census counts, which 
should continue to denote persons and units for whom direct 
evidence was perceived in the census enumeration. 

Some other related beliefs and matters may be treated 
merely in passing here, though they have been or are impor- 
tant in other times or contexts. First, it is not common any 
more to believe that census counts are free from errors and 
omissions, or can be so made either by definition or by force 
of legal requirements. The courageous disclosure policy of 
the Census Bureau helped to dispell that myth. 

Second, we do not now believe that with reasonable 
effort and expenditure that the undercounts can be reduced 
to the vanishing point, or even perhaps to any worthwhile 
extent [1]. There is less agreement about the possibility 
that some thorough, skillful, and expensive methods, applied 
to samples, could perhaps measure and estimate the under- 
counts reasonably well. Some sort of postenumeration 
survey, together with some dual records systems, are usually 
proposed here. 

Third, we are willing to consider the problems of popula- 
tion separately from other errors of observations. Never- 
theless, relations to other forms of missing data must be 
noted below. Furthermore, relationships to errors of observa- 
tions have been noted in references to "faulty and missing 
data" in the remarks of Director Barabba here and in the title 
of an American Statistical Association session in 1978. 
Also, powerful arguments have been advanced here about the 
link of misstatements about income to population under- 
counts for equity formulas. We must remain aware that the 
separation of population undercounts from other errors 
is an artifact for the sake of simplicity. Corrections of various 
kinds— editing, imputing, weighting— are commonly accepted 
both in censuses and samples for diverse types of nonre- 
sponse and for errors of response and observation discussed 

Fourth, the effects due to the obsolescence of censuses 
should be considered separately, though they seldom are, and 
also were largely neglected in this conference. They will 
be mentioned briefly later in connection with censal and 
postcensal estimates. 

Fifth, we do not expect good administrative registers and 
records to replace the need for censuses in the United States 
in the near future. However, the statistical profession remains 
neutral concerning their desirability. It is naive to believe 
that good population registers belong to authoritarian 
regimes. Actually, Scandinavia has the best registers, and in 
1980, Denmark will no longer take a census, Norway will take 
its last one, and Sweden is debating their need. 

It is in this rich context of errors of diverse types that we 
should look at the problems of undercounts in connection 
with other sources of missing data. My experience in reading, 
listening, and discussions has been that a great deal of con- 
fusion and some controversies arise simply for lack of clear 
terminology among the diverse sources of missing data. We 
lack agreement on the terminology for missing data (and for 
many other problem areas), but I hope the following will 
cover most common usage. 


We begin even here with problems of terminology. Terms 
other than the two above have been used for the entire class 
and sometimes "nonresponse," but we prefer to reserve this 
for one of the categories below in this general class. Note 
that as we proceed from 1 to 4, we descend from high levels 
of knowledge about the missing units to low levels, as our 
ignorance increases. 



1. Item nonresponse, not ascertained (NA), or not stated 
refer to specific variables: answers missing for an ele- 
ment (individual) for whom other observations have 
been accepted. Reasons for NA may be numerous: 
Refusal or incapacity on part of the respondent, 
omissions on part of the enumerator, or unusable or 
invalid answers that are voided in the editing process. 
This last item links the problems of missing units to 
errors in observations. 

2. Total nonresponse, or simply nonresponse, may be due 
to refusals; to not-at-homes (NAH) after appropriate 
attempts; inabilities of various kinds; not located, 
which depends on specific procedures; or lost or 
destroyed schedules. (We heard here about the many 
schedules "chewed up" by the machines.) The total 
nonresponse may refer to the individual element 
(person) or to its group (household). 

3. Cluster nonresponse refers to observations missing for 
larger units, such as entire areas or establishments, 
due to refusals, inaccessibility, absence of respondent, 
etc. Its sources, occurrences, effects, and treatment 
differ from the others. 

4. Noncoverage or incomplete frame denotes failure to 

include some units of the defined population(s) in the 
actual operational frame. The failures come from 
faulty preparation of the frame, from faulty execution 
of enumeration procedures, and from faulty responses. 
The net noncoverage refers to the excess of non- 
coverage over overcoverage, and the gross coverage 
error can refer to the sum of the absolute values of the 
two coverage errors. The census undercount is a special 
case of net undercoverage. 

5. Deliberate and explicit exclusions of sections of the 
population differ from noncoverage, also from non- 
response. Their sources, effects, and treatments are 
all different from those of the other four forms. 

For item nonresponse, evidence from the field indicates 
existence of individuals, plus some (usually most) of 
their characteristics. From relationships with these vari- 
ables, the missing item is imputed with some (though not 
total) confidence and objectivity. For total nonresponse, 
only the existence of the unit is ascertained, not its charac- 
teristics. When the nonresponding unit is the household, 
census enumeration of numbers of persons cannot be done 
directly, and those very numbers, as well as their character- 
istics, are imputed from "similar" households. Even this 
imputation is based on field ascertainment of the existence 
of the household. Hence, all of these adjustments and impu- 
tations have been properly considered as belonging to census 

However, noncoverage differs from the three classes of 
nonresponse above. It results in more difficult problems 
because of ignorance of the very existence of the missing 
units. The problems are intractable within the framework of 

the collection procedures of the census enumeration. For 
evidence about them, methods must go beyond those collec- 
tion procedures to other expensive methods, to models 
(demographic and other), and to subjective bases. Hence, 
adjustments for noncoverage should be considered as 
resulting in estimates rather than in census counts. 

It seems that the diverse types of missing units also tend 
to have different sources in the population. Item nonresponse 
tends to be higher in populations which may be termed "less 
developed"; in less developed countries, among rural and less 
educated groups, and among nonmembers of the labor force. 
In contrast, total nonresponse, refusals, and NAH's tend to 
be higher among the cosmopolitan, urbanized, educated, and 
labor-force members. Noncoverage, however, is most com- 
mon among the most mobile, least settled portions, espe- 
cially among young males (particularly poor young males), 
among migrants, and in mobile occupations. In the United 
States, it is especially high in centers of large cities (though 
more rural in many countries), among blacks, Chicanos, and 
some other ethnic groups. The quality of the enumeration 
is important; also, procedure, training, and budgets. These 
must be applied separately for low rates of noncoverage, 
nonresponse, item nonresponse, callbacks, repeat enumera- 
tions, checks, editing, and imputing. 


It is common to concentrate technical discussions on one 
kind of statistics and. to neglect others. It is also common to 
focus on only one type of missing data and neglect other 
types, even though the different types have diverse effects 
on the different kinds of statistics. These very differences 
should warn us to consider briefly the major kinds of 

1. Simple totals denote counts of persons, households, 
etc.; subclasses by age, sex, ethnic category, etc.; also, 
sums of variables beyond simple counts, such as 
acreages and incomes. We should also distinguish 
effects not only on national totals and on major 
domains, but also in small local areas and other small 
domains. The relative effects of different kinds of 
missing units and of undercounts can be very different 
for different sizes of domains. 

Noncoverage has the effect of imputing y ; =0 instead 
of the actual, individual values, and nonresponse has a 
similar effect, if uncompensated. Overall adjustments 
from demographic and similar models can be made 
sometimes reasonably well for the national totals but 
not for local areas. These latter suffer both from small 
sizes, and also from lack of enclosed populations; 
hence, from transmigrations across boundaries. 

2. Ratio means and averages (means and medians), includ- 
ing means for subclasses, are less affected than simple 


totals by noncoverage and nonresponse to the degree 
that these resemble the response portions. But they are 
affected to the extent that the noncoverage and 
nonresponse cases tend to differ from the responses; 
they suffer from differential nonresponses if these are 
not adjusted or compensated. This advantage of ratio 
means can be transferred to simple totals by ratio 
estimation, but only to the extent that the size of the 
noncovered population is known or can be estimated. 
And this is especially difficult to do for small areas. 
3. Multivariate relations tend to be affected differently 
than ratio means and simple totals. Here we include 
common comparisons between subclass means, as well 
as more complex analytical statistics, such as regres- 
sions. Like ratio means, relations tend not to be 
affected by "average" kinds of nonresponse or non- 
coverage. And they tend to be even less affected than 
ratio means to the extent that differential effects, if 
they exist, tend to cancel (because effects are of the 
"additive" kind). These are useful overgeneralizations 
to which exceptions can be found. 

However, multivariate relations can suffer greatly 
from relatively small portions of item nonresponses 
on each of the several variables involved in the 
multivariate relations; those small portions tend to 
cumulate toward their sum. Thus, omitting even small 
portions of NA's can become damaging to multivariate 
statistics; hence, imputation may be preferred. 


These raise special difficulties that are different from 
those of the other types of missing units noted in section 2. 
Item nonresponse occurs usually in situations where a great 
deal of information is available for the individual. Procedures 
can be designed, similar to those for editing faulty data, 
which permit imputing more or less reasonable values for the 
missing data. For total nonresponse, only the existence of the 
unit, either the person or the household, is ascertained. The 
values for the unit are then imputed with more or less 
success. For household nonresponse, even the number of 
persons is imputed. 

For noncoverage, the problems are more difficult because 
the number of units, persons, and households are themselves 
unknown; thus, the difficulty is greater than for nonre- 
sponse. Even this can be done reasonably well if good 
auxiliary variables are available, as with poststratification for 
the CPS and other samples. Those estimates rely on vital 
registry data based on the last decennial census. 

But undercounts in the census itself raise special problems 
because they need that very outside support of the decennial 
census, which it must provide both for itself and for other 
samples. Yet from reasonable models, which must depend on 
stability in transmigrations and in life processes, reasonable 

adjustments are feasible for national totals and for larger 
domains. However, for small domains and for local areas, the 
problems become much more difficult. Nevertheless, meth- 
ods for small domains estimation do exist, have been im- 
proved recently, and can be applied here. 

Adjustments of the census data require two phases. First, 
information must be obtained from sources outside the 
census itself. Clearly, this must be difficult and perhaps 
expensive; otherwise, the data would be collected from the 
census itself. There are three major methods which may be 
alternatives or used in combination for more reliability. 

1. Demographic analysis depends mainly on data from 
past censuses and from vital registers; also on models 
of stability in vital rates, and of large changes in 
missing rates as persons age. 

2. Postenumeration surveys attempt to find larger 
and/or different portions of the population— an 
assumption that cannot be verified entirely. 

3. Checks of diverse administrative records and registers 
can be made alone, or as part of a dual (or triple) 
records system [4] . 

Second, since the information obtained in the first phase 
would pertain only to major classes (demographic, ethnic, 
sex, etc.), it needs to be distributed to local areas and to other 
small domains, where it is most needed. This can be done 
with several methods which are generally known as "small 
domain estimation" or "postcensal estimates." They are of 
several types [6, 7] , and some of these may also be useful 
for distributing to local areas information obtained on a 
national or large scale. (1) Symptomatic accounting techni- 
ques may be used that utilize current data from administra- 
tive registers in combination with their statistical relation- 
ships based on earlier census data. Techniques using diverse 
registers have been developed by the Census Bureau. (2) 
Synthetic estimation (presented by Hill in this publication) 
refers to ratio estimates combining data from recent samples 
with census data for small areas. (3) Regression methods 
(given by Ericksen, also in this publication) use multiple 
regressions of census counts on data from registers with post- 
censal data from registers and from samples. (4) Bayesian 
methods can have great variety and flexibility that depend 
on subjective choices of models and parameters. (5) Iterative 
proportional fitting refers to flexible methods of categorical 
data analysis utilizing the strength of modern computers. An 
allocation structure establishes relationships between census 
data and associated variables for cells of small domains. 
Then an allocation structure of the associated variables for 
various marginal summations is used to readjust the data in 
the small cells. It allows for much more flexibility than the 
assumptions of linear relations in either synthetic or regres- 
sion methods. 

We must note that both phases— first, getting data and 

models for types, classes, and categories of missing persons, 


and then distributing them to small domains— involve models 
and subjective judgments. The extent and kind of subjec- 
tivism involved can be more or less and differs somewhat 
from that involved in imputing and editing for other kinds 
of missing and faulty data. The differences are not absolute, 
but they are not trivial either. (Perhaps these differences 
resemble those between "direct" and "circumstantial" 


Policy decisions concerning adjustments involve questions 
both of statistical principles and public policy. Policy ques- 
tions arise from the subjective nature of adjustments. Be- 
cause they are subjective, they could be matters for litigation 
and even for abuse, which could destroy the integrity or at 
least the credibility of an independent, objective, national 
statistical service. 

Clearly, there are judgments involved in decennial cen- 
suses. The choice of 10 years for census periods, instead of 
either 1 or 100, for example, depends on unstated models 
about the stability of census variables. So does the census 
date of April 1, which has effects on distributions of popu- 
lations, both de facto and de jure. These models of fluctua- 
tions should be explored elsewhere in terms of models 

Three decisions concerning adjustments have large policy 

1. Should there be an official adjusted figure? Of course, 
there are unofficial adjustments; likewise, continued 
use of an unadjusted decennial census for up to 14 
years later is also a decision. 

2. Where should the office for adjustments be located? 
In the Census Bureau or elsewhere? 

3. How far should adjustments go? There seem to be sev- 
eral criteria (as usual) contending for primary attention. 
There are simplicity, understandability, and objec- 
tivity. Then there are "equity," in some sense, and 
"accuracy," which may be put as some average of some 
error function, perhaps a mean squared error. The 
former set may be in conflict with the latter; that is, 
objective, simple adjustments may be small and not 
bring as much equity and accuracy as subjective, more 
complex adjustments. 

It seems that possible decisions concerning the census 
undercount may be placed roughly into four levels of action. 

1 . Accept the data essentially as obtained in the field by 
the enumerators after reasonable supervision and 
editing in the field. 

2. Add the Census Bureau's present methods for editing, 
processing, checking, and imputing. 

3. Adjust according to an agreed "convention" (as 
Keyfitz puts it well) to bring national and domain 
estimates up to the best available model for the 

4. Further investigate and design methods of imputation 
for improved adjustments for multivariate as well as 
univariate relations. 

Four levels may be too many. Probably we need not go 
back to the purity of level 1 and we may temporarily con- 
sider 4 only for methodological investigations. However, 
both levels 2 and 3 can be made available for diverse 

I hope the following proposals have wide acceptance at 
the conference and outside as well. They also accord well 
with decisions in Australia. 

1. The methods currently used and planned (level 2) 
will be designated as the decennial census counts. 

2. Censal estimates will also be computed in accord with 
estimates of the undercoverage and based on a 
convention [1] agreed to independently of expected 

These estimates should be kept distinct in name and 
concept from census counts as noted above. One 
proposal (by Roberts in this publication) is that esti- 
mates be released for the undercounts only, not for the 
adjusted populations [1]. That interesting proposal, 
however, would meet practical obstacles, especially 
in computing multivariate adjustments. 

The estimates would need to bring some compro- 
mise between adjusting for a good part of the under- 
count and simple acceptability. 

3. The locus and responsibility for the estimates should 

be assigned to an estimating office that would maintain 
statistical, scientific, and ethical independence from 
both the users and the producers of statistics. Probably 
a separate office in the Census Bureau would be best, 
perhaps with an impartial, prestigious advising and 
overseeing board. 

The same office could also be responsible for post- 
censal estimates and even for projections for popula- 
tions. It may be wise to link the censal estimates to 
postcensal estimates in methods and in conception and, 
for continuity. 

4. Decisions to use either the census counts or the censal 
estimates would rest with public decision bodies. It is 
possible that some, like apportionment bodies, would 
choose the census counts. But for other purposes 
(probably for revenue sharing) the censal estimates 
would be chosen. These offices may also prefer post- 
censal estimates to decennial censal estimates. Other 
offices (health, energy, etc.) would make separate 

Furthermore, some public and private bodies would 
prefer some other estimates, perhaps some that are less 


"safe," or less objective but more likely to come closer 
to the actual, current population. For these, as well as 
for the sake of future improvements, we hope that 
investigations will continue to be pursued vigorously. 

It is important to emphasize again, as some speakers have 
done, that there is no single, stable aim with specified toler- 
ance limit for population estimates. The aims are multi- 
purpose and multisubject, flexible and changing, and desired 
precisions are only vaguely felt and subject to compromise— as 
is usual for actual statistical and sampling designs. 


1 . Keyfitz, N. "Information and Allocation: Two Uses of the 
1980 Census." American Statistician, 33 (1979), 45-56. 

2. Kish, L. "Samples and Censuses." International Statistical 
Review (WIS). 

3. Kish, L. "Rotating Samples Instead of Censuses." Pacific 
Forum (1979). 

4. Marks, E. S.; Seltzer, W.; and Krotki, K. J. "Population 
Growth Estimation." Asian and Pacific Census Forum. 
New York: Population Council, Sections 7.D.1, 1974. 

5. National Research Council. Counting the People in 1980. 
Washington: National Academy of Sciences, 1979. 

6. Purcell, N. J., and Kish, L. "Estimation for Small Do- 
mains." Biometrics, 35 (1979), 365-384. 

7. Purcell, N. J., and Kish, L. "Postcensal Estimates for Local 
Areas (or Domains)." International Statistical Review, 

The Analysis of Census Undercount 
From a Postenumeration Survey 

A. P. Dempster and T. J. Tomberlin 

Harvard University 


In this paper we consider methods for the estimation of 
census undercount for subgroups of the population, with 
particular reference to small geographic areas. More specifi- 
cally, an intensive analysis of a postenumeration survey 
(PES) is seen as potentially very informative. Empirical Bayes 
analysis of logistic models with random effects opens up a 
wide range of models which a priori seem to reflect the 
inherent structure in a complex PES and, in addition, could 
lead to improved estimates of census undercount for small 
subgroups. A Bayesian analogue to the simple ratio- 
expansion technique for extrapolating from the PES 
estimates to the population using census data is presented, 
and the extent of uncertainty in the estimates obtained is 
seen as being available through their approximate posterior 
variances. Finally, some comments are made with regard to 
the implications of these proposals on the design of a PES. 


The national census of population is a basic data source 
for allocating legislative representation, for redistributing tax 
revenues, for economic and social planning by businesses and 
government agencies, and for research studies of many kinds. 
Since a census is by definition a complete enumeration, it is a 
paradoxical fact that one of the most challenging aspects of 
census evaluation is to assess the extent and distribution of 
the inevitable incompleteness of the actual enumeration. 
Accuracy for local areas and special subgroups of the 
population is becoming increasingly important, and this in 
turn creates pressure to improve tools and methodology for 
estimating the census undercount of such areas and sub- 
groups. In this paper we concentrate on potential improve- 
ments which may be achieved through detailed analysis and 
modeling of a PES. 

Undercount may be assessed either by demographic 
analysis or by matching studies, the latter category including 
PES methodology. Demographic analysis [23] employs 
aggregate data from sources external to the census as input 
to models which predict the population on census day. The 
external sources include birth and death registrations, immi- 
gration and emigration statistics, previous censuses, and 
administrative records such as Social Security and Medicare 
files. While effective at a national level, demographic analysis 
for subnational areas must rely on internal migration infor- 
mation drawn from the census itself. As noted by Siegel eta/. 

[23] , the necessary migration data for subdivisions of a State 
are lacking. 

Matching studies require external data collected close in 
time to the census date. Individuals picked up in the external 
data are compared against appropriate census categories, and 
those missing from the census are identified as contributing 
to the undercount. The proportion missed in the matched 
sample is then extrapolated to the census to provide 
estimates of census undercount. Matching can be based on 
observational sources such as administrative records, but a 
well executed PES based on a probability sample can provide 
important scientific advantages in terms of data quality, 
control of coverage, and use of the randomization hypothesis 
for statistical inference. The U.S. Bureau of the Census [22] 
has conducted matching studies after each census since 
1950, including sample surveys as a major component 
of the matching studies in 1950 and 1970. Undercount 
assessment for the 1976 Canadian Census of Population and 
Housing emphasized PES methodology, as described by 
Theroux and Gosselin [20, 21 ] . 

To be worthwhile, a PES must avoid the undercount prob- 
lems of the census itself, both overall and over important 
subgroups. The PES therefore requires intensive and expensive 
search procedures, in principle, affordable because the PES is 
a tiny fraction of the census. We are not concerned in this 
paper with techniques for assuring an accurate PES. Our 
analysis is concerned solely with what can be learned from 
an accurate PES. In practice, it would be necessary to 
assess the effects of undercount in the PES on the resultant 
assessment of census undercount, but we do not present here 
formal tools for such second-order undersount assessment. 

We primarily address the problem of inferring from a 
thinly spread PES to local areas which need not be included 
in the survey. A typical PES will be a multistage survey with 
a nested strucfture of primary sampling units (PSU's), 
secondary sampling units (SSU's) within PSU's, tertiary 
sampling units (TSU's) within SSU's, and, finally, households 
within TSU's. For example, we might need to make a 
statement about the census undercount of teenage black 
males in a local area SSU within a county PSU not included 
in the postenumeration survey. Since such assessments can 
only be made with uncertainty, we suggest that approximate 
Bayesian posterior distributions are appropriate reporting 

This work was facilitated in part by National Science 
Foundation Grant MC77-271 19 



mechanisms. Our posteriors are roughly described by 
posterior means and posterior variances. 

To carry out Bayesian analysis requires detailed models. 
To find appropriate models requires extensive data analysis 
of the results of the PES. The principles required for such 
modeling are all well known, and the required technology is 
feasible but by no means completely developed at present. 
Part of our goal is to outline the required development of 
these statistical techniques. 

In section 2, we introduce the basic foundation of logistic 
models which permit representation of the probability that 
an individual will be captured in a census as a function of his 
or her characteristics and place of residence. We also 
introduce the basic ratio-expansion device whereby a fitted 
probability for each individual observed in the census can be 
used to create an estimate of the number of missing 
individuals in specific categories. These simple modeling and 
estimation methods are not adequate to produce posterior 
distribution from a complex PES, so we proceed to specify 
random effects in the logistic models which allow more 
realistic assessments of posterior variability in fitted prob- 
abilities. Also, we show how to extend the ratio-expansion 
estimate of undercount to an approximate posterior dis- 
tribution of undercount. Finally, we comment on some of 
the implications of our modeling framework on the design 
and analysis of a PES. 


The analysis of PES data is directed towards estimating 
both an overall undercount rate and the dependence of 
undercount rate on factors associated with individuals, 
households, and geographic locations. Techniques for 
assessing variation in undercount rate involve models, either 
explicitly or implicitly. We make our models explicit by 
postulating an idealized probability of being missed for each 
individual in the population and then using formal mathe- 
matical models which represent this probability as a function 
of factors associated with each individual. Logistic multiple 
regression [4] provides a large and flexible class of 
models for the representation of individual probabilities. 
Here, we limit discussion to standard logistic models with 
fixed effects only, while later we make the necessary 
extension to random effects, which permits representation of 
intraclass correlation within households or larger units. 
Another potentially important extension, to include 
modeling the probability of missing a whole household unit, 
is mentioned later. 

Currently available estimation techniques for small 
domains have been reviewed recently by Purcell and Kish 
[17]. Although often applied to estimating counts such as 
unemployment or mortality statistics, most of the available 
techniques were designed primarily for continuous variables. 
Examples include the regression models of Ericksen [8] , 
the synthetic estimation techniques of Gonzalez and 

Hoza [10], and the prediction models of Holt, Smith, and 
Tomberlin [12] and Laake [13]. Purcell and Kish note 
that "a categorical data analysis framework appears as a 
logical approach, but it has received little attention to date." 
Contingency table methods have been used mainly for 
rescaling survey data to match margins taken from an 
external data source, such as approximate population 
marginal totals. The basic technique is the iterative propor- 
tional fitting method of Deming and Stephan [5] . Purcell 
[16] defines a general approach including the raking ratio 
methods of Brackstone and Rao [1] and the work of 
Chambers and Feeney [2] , who adjust aggregate survey 
data to match externally obtained small area characteristics. 
Available small area techniques do not apply to the census 
undercount problem, where the essence is the incompleteness 

of the external data source, i.e., the census. 

Here, we rely completely on techniques designed for 
counted data. The first part of the discussion describes 
logistic models which lead to estimates of undercount rates 
for subgroups of the population. We then proceed to define 
straightforward ratio-expansion techniques which adjust 
census counts upwards to reflect undercount rates estimated 
from the PES, either for the whole census or for subgroups. 

We use the symbol q with appropriate subscripts to 
represent the probability that an individual is missed in the 
census, and p = 1 — q to denote the complementary prob- 
ability of being counted. The subscripts attached to p and q 
define levels of factors which affect p and q. For purposes of 
illustration, we will assume that categories are defined for sex, 
age groups, and race groups represented by subscripts u, v, 
and w, respectively, and we will represent the triple (u,v,w) 
by the single symbol /u for convenience. We will also assume 
that the symbol v denotes ij,k,/,m, where / represents PSU,/ 
represents SSU within PSU, k represents TSU within SSU, / 
represents household within TSU, and m represents an 
individual within a household. 

A typical logistic model might assume the mathematical 






where the logit function is defined by 

w , w . ln (-rH)iin(f) ,2 > 

Note that 


t(qr) (3) 

logit(p) = 'n(— )= -In— ="logit( 

The subscript n on 6 in model (1) indicates that the logit is 
allowed to depend on the sex, age, race combination defined 
by ji = (u,v,w). No local area effect appears in model (1), 
indicating that any variation in undercounts between areas 
would be due to variation in the local area distributions 
across sex -age-race classes. 


By introducing a local area effect, the model can be 
improved to allow for variation in undercount rates beyond 
that which is due to differences in sex-age-race population 
distributions. One such model would be 

logitlp^)^ +e m 


Here, the geographic parameter 0y/i depends only on the/th 
SSU within the /th PSU. The additive form 9 + -^ means 
that no interaction is permitted between the sex-age-race 
effect 9 and the local area effect •//!■ 

A more realistic model might permit the race effect to 
depend on the local area, so the term <t> ;/j\ appearing in 
model (4) could be replaced by <t> w j(j\, so that model (4) 
would become 

i°giV =9 M + 



Both models (4) and (5) predict that the undercount rate will 
vary with racial composition, but model (5) bases the 
estimate of such variation on each local area, while model 
(4), like model (1), uses an aggregation of local areas to 
predict the effects of varying racial composition. 

Models (4) and (5) have the advantage of increasing real- 
ism, but have the disadvantage that only data within the local 
area j(i) can be used to estimate the local area effect .y.-i in 
model (4) and the local area-race interaction effect w -.n\ in 
model (5). Furthermore, for either of models (4) or (5), the 
possibility of estimation for local areas j(i) not included in 
the PES is not immediately obvious. A major purpose of the 
random effects models, which we introduce in section 3, is to 
permit a compromise between model (1) and models (4) and 
(5), so that local data can be used to an appropriate extent 
when available. 

Another device for improving logistic models is to 
introduce covariates into the model. For example, if X-/:i is 
a measure of the wealth of local area j(i), then model (5) 
might be revised to 

><*» V"V^ x /fl/ 


which permits the influence of race on undercount to depend 
on the wealth of a local area, and simultaneously improves 
the usefulness of the model relative to model (5) because it 
applies to any local area for which X.-/yi is available, 
including areas not in the PES. 

Techniques for estimating parameters in models such as 
(1), (4), (5), and (6) generally require iterative computation, 
but computer programs are widely available. The standard 
methods produce estimates which are maximum likelihood 
under the assumption that the individuals in the PES are 
enumerated or not according to independent binomial 
drawings with probabilities p and q uv . Even though the 
independence assumption is generally false, the estimates will 

be reasonably good with moderately large samples. The 
reason for the failure of the independent binomial assump- 
tion is that the fixed effects models such as (1), (4), (5), and 
(6) cannot incorporate the intraclass correlation among/? Ul> 
within a household or TSU, or perhaps even within an SSU, 
without suffering too much degradation of accuracy due to 
small sample size. It follows that another major reason for 
going to random effects models later is to permit credible 
likelihood analysis and thence credible Bayesian inferences 
about undercount rates. 

Even without the complication caused by the inappropri- 
ateness of the independent binomial sampling assumption, it 
is not possible to estimate the mean square error of estimates 
provided by reduced models such as those discussed in this 
section. Variance estimates depend on the assumed model 
being true and do not reflect the bias which is inherent in 
estimates based on models with reduced numbers of para- 
meters. Gonzalez and Waksberg [11] consider an ad hoc 
method for obtaining a measure of the mean square error for 
their synthetic estimates. Holt, Smith, and Tomberlin [12] 
derive mathematical expressions for the total mean square 
error of the predictive estimates obtained from the analysis 
of variance type models, but they do not suggest how one 
would estimate these measures. As we have shown later, 
the posterior variances available when more complete, 
random-effects models are used provide measures of the 
reliability of undercount estimates. 

The choice of a specific logistic model depends partly on 
what is plausible a priori and also on what appears to fit the 
PES data. That is, extensive data analysis should be per- 
formed, trying many different models before narrowing the 
choice to models which appear to fit adequately. One way to 
judge adequacy of fit is to look at differences between actual 
undercount rates and fitted undercount rates for subgroups 
of the PES, and to judge whether the differences are large 
enough to affect end uses of the adjusted census figures. The 
other way to judge fit is through significance tests. Signifi- 
cance tests are difficult to use in the present instance because 
the only available tests rely on the independent binomial 
model, which is not valid. Reliable goodness-of-fit tests may 
be feasible in the context of the random effects models 
shown later, but the required techniques have not yet been 


We now turn to a discussion of the application of PES 

estimates to census counts. The main points can be illus- 
trated by example. Suppose it is desired to estimate the 
undercount for a particular sex-age-race group ju = u,v,w 
within a particular SSU indexed by iff), and suppose that 
model (5) has been fitted to PES data. Several cases require 
separate treatment. If the SSU j'(l') is included in the PES, 
then p from model (5) is directly fitted from the PES, and 
the undercount may be directly estimated from 


V-H'i) P 






wnere P ixv and ^ \xv = 1 ~ P lxv come f rom the f |tted model 
and Q njfi) is tne census count in the subgroup identified by 
jU and j(i). Here we make use of the simple ratio estimate as 
described by Cochran [3] , among others. If the /th PSU is 
included in the PES, but SSU j(i) is not sampled, then it is 
necessary to replace -^ in the fitted model (4) by an 
average over the fitted values obtained for SSU's j(i) sampled 
within the /th PSU, so that a fitted p is obtained and 
model (7) can be used. Finally, if the /'th PSU is omitted from 
the PES, it is necessary to replace <p -a-, in the fitted model 
(5) by an average over sampled PSU's. 

Two further points may be noted. First, to obtain 
estimates of undercount for aggregated subgroups, the 
principle is to estimate for smallest subgroups for which the 
model provides fitted p and then aggregate the estimated 
undercounts. For example, suppose we fit model (1), where 
the logits depend only on the sex-age-race class li. Using 
maximum likelihood techniques, we can easily show that 

" uv 
the ratio— —can be estimated by 


q uv _ U IX- 
p av V 


Here, u anc ) c are the numbers of missed and counted 
individuals in the whole sample who are members of the 
sex-age-race class ix. The undercount for the /th PSU 
(whether or not it is in the PES) is estimated by 

U,= 2 7 

/ ^ c 




where C • is the census count for the subgroup ix in the /th 
PSU. It should be noted that this model is very similar to 
model III of Holt, Smith, and Tomberlin [12], and, like 
that model, it leads to an estimator very similar to the 
synthetic estimator of Gonzalez and Hoza [10] . 

Second, a disaggregated estimate such as model (7) or 
model (9) can be improved, perhaps only slightly, by using 
the predictive principle of finite population estimation 
introduced by Royall [19]. The idea is that actual under- 
count is known precisely for individuals included in the 
PES, so that the ratio estimation principle embodied in 
models (7) and (9) need be applied only to individuals in the 
census but not in the PES. The final estimate of undercount 
then comes from summing the number of actual missed 
individuals from the PES and the estimated number from 
model (7) or (9) applied to the non-PES counts. 

Once we have learned to improve fixed-effects logistic 
models such as (1), (4), (5), and (6) by including random ef- 
fects., as described in the next section, we will be in a position 
to specify plausible posterior distributions for the p. These 

posterior distributions in turn make it possible to refine the 
crude ratio-expansion principle (7) and (9) so as to obtain 
approximate posterior distributions of undercount, as we 
explain later. 


In this section, we move towards more complex and 
realistic models and at the same time become more Bayesian 
in outlook. The two kinds of change are linked because the 
parameter set grows rapidly as the model becomes complex, 
and can only be managed by considering the parameters as 
random. Such swarms of parameters cannot be estimated by 
classical inference methods, but are amenable to Bayesian 
treatment. The required statistical technology is feasible but 
still in a rather early stage of development for logistic 
models. The estimator here uses empirical Bayes as defined 
by Bobbins [18]. Also, our estimator is similar to the 
James-Stein estimator as described, for example, by Efron 
and Morris [7] . 

The basic idea is to include terms in the logistic model (4) 
which describe variation in p within each of the stages of 

the multistage PES design. Specifically, we may write 
logit ( Pilv ) - M + d> t . + 4» m + + m + t m) 


where the (f>- are regarded as drawn from a N(0, a 2 ) 
population, the 4> ;/,-) from a N(0, a 2 ) population, the 0w/;i 
from a N(0, a 2 ) population, and the ////£•! from a N(0, a 2 ) 

population. These random effects imply that individuals in a 
PSU have a common element entering their P uv , and the 
same occurs for the nested classes of individuals in a common 
SSU, common TSU, and finally a common household. 

Without further research, it remains unclear how accu- 
rately the variances o , a 2 , a 2 , and a 2 can be estimated from 

' 12 3 4 

PES data, nor is it easy to see without repeated analyses of 

the data what effect different choices of the a 2 will have on 

final undercount estimates. The models do, however, capture 
levels of variation which a priori judgment alone strongly 
suggests must underlie the PES data. 

Some advantages of the hierarchical random effects model 

(10) were mentioned previously. Once values of the a- 2 are 

tentatively adopted, it becomes possible to introduce cor- 
responding factors into the likelihood analysis and hence 
produce approximate normal posterior distributions for the 
logit (p ), which automatically and correctly weight 
undercount frequencies observed at the various levels of the 
multistage design. For example, the posterior mean of logit 
(p ) for an individual m(ijkl) who appears in the PES 
automatically uses information from the individual's house- 
hold, TSU, SSU, and PSU. More remarkably, a posterior 
mean logit(p ) can be found for an individual m(ijkl) not 


in the PES, and again the PES counts are automatically 
weighted, where the weighting scheme depends on which if 
any among i, j(i), k(ij), or l(ijk) appear in the PES. Similarly, 
we can find posterior variances which appropriately incorpo- 
rate the available information about each individual. 

The basic mathematical development facilitating approxi- 
mate computation of the required posterior means and 
variances appears in Laird [14]. Some initial experience 
with variance estimation is found in Miao [15]. Neither of 
these papers treats examples of the degree of complexity 
required for a real PES, so that detailed research and 
development will be needed, but the principles are in place. 

Two possible extensions of the random effects model (10) 
deserve mention. The first of these allows that any of the <j) 
terms appearing in the model can have a dependence on 
characteristics of the corresponding sampling unit, whether it 
be household, TSU, SSU, or PSU. The predictive accuracy 
for local areas with known characteristics is enhanced when 
the undercount rate shows dependence on available charac- 
teristics. Such a technique is used, in essence, by Fay and 
Herriot [9] in the context of a standard linear regression 

The second extension would allow interactions between 
fixed and random effects in the model. For example, the race 
effect might be postulated to vary randomly from PSU to 
PSU. Such an interaction random effect would require yet 
another variance component in the model, but it could be 
important to allow for such randomly varying race effects in 
order to obtain realistic posterior variances of minority 
undercounts in local areas. 


The purpose of this section is to follow through the logic 
of Bayesian estimation of undercount. To be specific, we 
consider how to estimate the number missed in a particular 
sex-age-race category jx and a specific local area k(ij). The 
estimate is defined to be the posterior expectation of the 
unknown undercount. Since expectation is a linear opera- 
tion, the posterior expectation of a more aggregated under- 
count is found by aggregating the corresponding 
disaggregated posterior expectations. 

The PES contributes to Bayesian estimation in two ways. 
The first way is trivial, namely, if the local area k(ij) is part 
of the PES, then a certain number of individuals are 
identified with probability one as belonging to the under- 
count and contribute themselves to the posterior expected 
undercount as whole units. The more difficult task is to 
estimate the undercount among the remaining individuals not 
picked up in the PES, which is to say, the vast majority. For 
this latter purpose, the PES contributes a posterior distribu- 
tion of the unknown p as discussed previously. In order to 
focus the attention of this section on census data, we assume 
for most of the discussion that the p undercount prob- 

abilities are known. The final estimation step is thus to 
average the conditional estimates given p over the posterior 
distribution of p . 

Suppose that true population size is N = U + C, where U is 
the undercount and C is the census count in subgroup k(ij). 
Let N = U + C be the corresponding counts for the 
sex-age-race classes, with q the corresponding undercount 
probabilities. We assume that any individuals picked up in the 
PES are excluded from these counts. Note that we have 
chopped the subscripts k(ij) from p , N, U, C, N , U , and 
C to save space. 

If h(U) denotes the joint prior density of the U , then 
the posterior density of U is proportional to 

U + C 








Provided the C are reasonably large, the joint prior density 
has little influence and can be taken to be uniform, so that 
model (1 1) becomes a product of negative binomial densities. 
Thus, conditional on the C and p , the U are independ- 
ent negative binomials with means approximately at the ratio 
expansion value 

*. -ft)' 



thus connecting Bayesian techniques with standard estima- 
tion procedures. On the other hand, model (11) is a standard 
Bayesian formula appearing, for example, in Draper and 
Guttman [6] . Thus, the mean of the posterior distribu- 
tion of U, the undercount for the K(ij)th TSU, is just the sum 
of the posterior means of the U 's, i.e., 


q n 



which is the separate ratio expansion as defined by Cochran 

[3] , for example. 

Refinements involving smoothing the C or weighting 

from other strata may be needed if the C are too small. In 

principle, these require Bayesian modeling of the census 

counts. No discussion of such modeling is included in this 


To summarize, we obtain a basic posterior distribution of 

U conditional on p . The posterior density thus obtained 

must be averaged over the posterior of p , which is 

considered fixed in this development, and also may need to 

be averaged over the posterior distribution of C if the C„ 

are small enough to introduce sampling error comparable to 

the sampling error of p from the PES. In practice, normal 

distributions can be used to approximate the required 


posterior distributions. If posterior variances are required for 
the U and their aggregates, then approximate normal 
covariances are needed for U both within and between 
local areas. Considerable bookkeeping is needed to keep track 
of such covariances, but they are not difficult to compute. 


We have argued in this paper that a well designed and 
executed PES is an effective tool for assessing disaggregated 
census undercounts and may be the only trustworthy 
method available for local areas. A necessary complement to 
the PES data is careful and extensive analysis of the data 
directed towards modeling the relationships between under- 
count rate and characteristics of individuals, households, and 
local areas. Such modeling is a necessary prerequisite to 
carrying out Bayesian inference aimed at assessing approxi- 
mate posterior distributions of undercount for special 
subgroups of the population. 

Several mechanisms are available for incorporating 
non-PES information into the PES analysis. First, if separate 
estimates of undercount rate are available for local areas, 
such estimates can be introduced directly into a logistic 
model, much as the variable X was inserted into model 
(6), except that the coefficient need not depend on ju. 
Secondly, if prior information about undercount rates by 
sex-age-race are available from independent sources such as 
pilot studies and studies of another census, such prior 
information may be entered directly into the Bayesian 
analysis suggested in section 3. What is needed is a way to 
translate such prior information into a roughly corresponding 
normal prior distribution of the 9 parameters of models 
such as (1), (4), (5), and (6). 

The Bayesian approach suggested here has implications for 
the design and execution of the PES. For example, our 
analyses require that PES information be saved for detailed 
analysis down to the finest levels of disaggregation; a simple 
requirement, but one apparently not met for the 1976 
Canadian PES. A more subtle implication is that the PES 
should be designed with the widest possible coverage. 
Bayesian estimates for local areas can be expected to differ 
considerably, depending on whether and how extensively the 
local area is represented in the PES. If the local area is not 
represented, then the corresponding estimates must depend 
entirely on data from other local areas, whereas a represented 
local area can give weight to specific data for the local area. 
The traditional frequency theory used to design surveys does 
not come to terms with such differences in posterior 
accuracy, which careful Bayesian analysis will demonstrate. 

Bayesian analysis makes possible a cost/benefit approach 

to the question of how much resources should be devoted to 
a PES. For example, it should be possible to assign an 
operationally meaningful cost due to the misallocation of tax 
funds due to inaccurate census figures. Given a posterior 
distribution of correct census figures, one can define an 

improved allocation system which will minimize the mis- 
allocation cost, given the state of knowledge defined by the 
posterior distribution. Hence savings from the PES can be 
estimated, and savings from a larger or smaller PES can also 
be estimated, leading to rational comparisons of different 
designs. It would similarly be possible to consider tradeoffs 
between (a) expenditures to increase census accuracy and (b) 
expenditures to improve census accuracy after the fact using 
a PES. 

Our analysis of undercount has relied on the concept of 
the probability p that an individual will be enumerated in 
the census. We are implicitly combining individuals belonging 
to households which are contacted with individuals belonging 
to households which are missed. A further refinement would 
be to make separate estimates of undercount for the two 
classes of individuals. The analysis for the first class would 
parallel that given earlier in the paper, while the latter would 
require models for the probability of missing a household 
considered jointly with household size and composition. We 
do not attempt to pursue details here. 


1 . Brackstone, G., and Rao, J. N. K. (1975). "Raking Ratio 
Estimators." Bulletin of the International Statistical 
Institute (1975) 129-132. 

2. Chambers, R. L., and Feeney, G. A. "Log Linear Models 
for Small Area Estimation." Paper presented at the Joint 
Conference of the CSIRO Division of Mathematics and 
Statistics and the Australian Region of the Biometrics 
Society, Newcastle, Australia, Biometrics (1977), ab- 
stract no. 2655. 

3. Cochran, W. G. Sampling Techniques. New York: Wiley, 

4. Cox, D.R. The Analysis of Binary Data. New York: 
Halstead, 1970. 

5. Deming. W. E., and Stephan, F. F. "On a Least Squares 
Adjustment of a Sampled Frequency Table When the 
Expected Marginals Totals Are Known." Annals of 
Mathematical Statistics 11 (1940)427444. 

6. Draper, N., and Guttman, I. "Bayesian Estimation of 
the Binomial Parameter." Technometrics 13 (1971) 

7. Efron, B., and Morris, C. "Data Analysis Using Stein's 
Estimator and Its Generalizations." Journal of the 
American Statistical Association 70 (1975) 31 1. 

8. Ericksen, E. P. "A Regression Method for Estimating 
Population Change in Local Areas." Journal of the 
American Statistical Association 69 (1974) 867-875. 

9. Fay, R. E., and Herriot, R. A. "Estimates of Income for 
Small Places: An Application of James-Stein Procedures 
to Census Data." Journal of the American Statistical 
Association 74 (1979) 269-277. 

10. Gonzalez, M., and Hoza, C. "Small Area Estimation of 


Unemployment." Proceedings of the American Statistical 
Association, Social Statistics Section (WIS) 437-443. 

11. Gonzalez, M., and Waksberg, J. L. "Estimation of the 
Error of Synthetic Estimates." Unpublished paper pre- 
sented at the first meeting of the International Associa- 
tion of Survey Statisticians, Vienna, Austria, 1975. 

12. Holt, D., Smith, T. M. F., and Tomberlin, T. J. "A 
Model-Based Approach to Estimation for Small Sub- 
groups of a Population." Journal of the American 
Statistical Association 74 (1979) 405-510. 

13. Laake, P. "A Predictive Approach to Subdomain Estima- 
tion in Finite Populations." Journal of the American 
Statistical Association 74 (1979) 355-358. 

14. Laird, N. "Log-Linear Models with Random Parameters: 
An Empirical Bayes Approach." Ph.D. Dissertation, 
Department of Statistics, Harvard University, 1975. 

15. Miao, L. "An Empirical Bayes Approach to Analysis of 
Inter-Area Variation." Ph.D. Dissertation, Department 
of Statistics, Harvard University, 1977. 

16. Purcell, N. J. "Efficient Small Domain Estimation: A 
Categorical Data Analysis Approach." Ph.D. Dissertation, 
Biostatistics, University of Michigan, 1979. 

17. Purcell, N. J., and Kish, L. "Estimation for Small 
Domains." Biometrics 35 (1979) 365-384. 

18. Robbins, H. I. "An Empirical Bayes Approach to Statis- 

tics." Proceedings of the Third Berkeley Symposium I. 
Berkeley: University of California Press, 1955, 157-164. 

19. Royall, R. "On Finite Population Sampling Theory 
Under Certain Linear Regression Models." Biometrika 57 
(1970) 377-387. 

20. Theroux, G.,and Gosselin, J. F. "Parametric Evaluation— 
1976 Census: Reverse Record Check, Part I: Methodol- 
ogy." Census Survey Methods Division Report, Statistics 
Canada, 1977. 

21. The'roux, G., and Gosselin, J. F. "Reverse Record 
Check— Basic Results on Population and Household 
Undercoverage in the 1976 Census." 1976 Census 
Parametric Evaluation, Final Report No. 7, Census 
Survey Methods Division Internal Report, Statistics 
Canada, 1978. 

22. U.S. Bureau of the Census. "Coverage of Population in 
the 1970 Census and Some Implications for Public 
Programs." Current Population Reports, Series P-23, 
No. 56. Washington, D.C.: U.S. Government Printing 
Office, 1975. 

23. U.S. Bureau of the Census. "Developmental Estimates of 
the Coverage of the Population of States in the 1970 
Census Demographic Analysis." Current Population 
Reports. Series P-23, No. 65. Washington, D.C.: U.S. 
Government Printing Office, 1977. 

Some Empirical Bayes Approaches to Estimating 
the 1980 Census Undercount for Counties 

Robert E. Fay III 

Bureau of the Census 


The title given this paper reflects the author's original 
intention to submit for the review of this conference a 
detailed plan to form estimates of net census error for 
counties on the basis of data expected from the census 
evaluation program. The author has instead taken a some- 
what different course: To outline for the purposes of other 
researchers the basic scope of the evaluation data; to 
emphasize aspects of the data that may impact on the 
question of small area estimation; and to sketch a possible 
program of estimation that might be developed to produce 
estimates for counties and other sub-State areas. 

Some general comments are first in order. It should be 
clearly noted that the focus of this paper is on the technical 
issues associated with the estimation of net census error, as 
opposed to the policy issues arising from adjustment of the 
census counts. The author intends that this paper not be 
construed in any fashion to advocate a position on 

As a second general comment, the paper proceeds on a 
presumption that there will be tolerance of potentially 
complex estimation procedures, provided that such an 
approach can be shown to have attractive statistical proper- 
ties. Certainly, others at this conference have argued that 
simplicity has particular policy virtues, but this paper will 
view simplicity as an unnecessary constraint in forming 
estimates that fully capture the information given by 
available data. 


The currently envisioned coverage evaluation program for 
the 1980 census comprises three major projects: A Postenu- 
meration Survey (PES), which attempts to measure all 
aspects of census coverage error by a direct household 
survey; measurement of the census coverage of sample 
households in the Current Population Survey (CPS), which 
attempts to represent most but not all aspects of census 
coverage; and an Administrative Record Match Study (ARM), 
which uses a sample of persons to measure the complete- 
ness of coverage of the combined IRS-Medicare files, and, by 
implication, the true total population and concomitant net 
census error. 1 These three programs have already been 

described at this conference, but they will be reviewed here 
from the perspective of their possible utility in making 
estimates of net census error for relatively small geographic 
units such as counties. This review will proceed by first 
making a number of observations about general aspects 
common to all three before noting their individual features. 

Interlocking Nature of the Projects 

Although the three parts of the evaluation program may 
be conceptually separated, they must be interlocked in the 
estimation. Only the PES is designed to stand fully on its 
own merits, at least in theory, as a measure of net census 
error for the entire population. In practice, however, the PES 
may be the most deficient in some aspects, and data from the 
CPS or ARM may be used to remedy its problems. The CPS 
program by itself fails to represent the effect on net census 
error of erroneously included or duplicated households in the 
census counts. 2 The ARM is, in a sense, the most dependent 
of the programs, since its sample will be drawn jointly from 
the CPS and PES. In addition, the ARM will measure the net 
census error for the adult population only. 

The interlocking feature of the design may have important 
implications for small area estimation. For example, a 
potential difficulty arises for any regression method 
attempting to analyze the data on a county level. Both PES 
and CPS are designed with a first-stage selection of counties 
in most States, where the PES selection of counties is 
independent of the CPS selection. Consequently, counties 
with CPS sample only will be biased by the omission of the 
components of overenumeration estimated only in the PES. 3 
Most (or all) regression approaches that have been considered 
in the literature on small area estimation presumed condi- 
tionally unbiased (although with perhaps high variance) 
estimates for the sampled first-stage units. 

2 The E-sample is now intended for this purpose. 

3 Tentatively, the E-sample will be selected from the same 
first-stage counties as CPS, thus in large part eliminating this 

'The PES has been eliminated as an independent survey. In its 
place, a sample (E-Sample) will be drawn from the census to represent 
the components of net error not available from CPS. 

The main text of this paper remains essentially as 
delivered at the conference. By April 1980, the 
planned evaluation program radically altered. A series 
of footnotes to the text will summarize the most 
important changes to April 1980. 



A second effect of the interlocking design is that the 
complexities of the estimation will probably induce a 
complex covariance structure among the sample estimates for 
individual counties or other small areas. In turn, a complex 
covariance structure will increase the technical difficulty in 
fully analyzing regression estimates for small areas. 

Scope of the Evaluation Program 

A second comment serves only to caution against viewing 
the PES-CPS-ARM program as completely given at this date. 
As a consequence of discouraging pretest results, the merits 
of the PES program and its currently projected sample size 
are under review. Other awaited test results may help to 
determine the scope of the ARM project, which could be 
undertaken under a number of options on the use of the CPS 
and PES samples. This uncertainty, although temporary, 
affects the current ability to plan detailed estimation 
procedures at this time, since the sample size and its 
distribution are not yet fixed. 

Variance of the Estimates 

A related comment, with implications similar to those of 
the preceding point, is that the reliability for a given sample 
size remains a matter of speculation at this point. This 
problem extends across all three procedures. For example, 
our original design assumptions for the PES were based on a 
presumption of binomial-like variation in census omissions 
affected by a moderate to large within-household correlation. 
In fact, however, it appears that several factors, including 
variation in household size, interclass correlation between 
housing units on a block, high correlation within households, 
and the potentially high absolute level of gross omissions and 
erroneous enumerations associated with a more moderate net 
census error, inflate the sampling variance of the estimated 
net error by substantial amounts. Sampling variances esti- 
mated for the pretest results were about 6 to 20 times those 
that would have been guessed based on our original design 

Resolution of Conflicting Results 

The coverage evaluation program includes three separate 
projects, partly for reasons of reducing variance, but also 
partly in an attempt to find methods to reduce bias. This 
approach leads naturally to potentially conflicting results; for 
example, for some segments of the population, up to four 
separate estimates of error will be possible: from the PES, 
the April CPS, the August CPS, and the ARM. 4 Resolution 
of systematic conflicts not due to sampling error alone will 
require careful analysis, but a likely outcome will be the 
production of possible alternative sets of estimates. 

This potential conflict transfers directly to small area 
estimation. Much of the analysis of the discrepancies 
between estimates will be, of necessity, on a national basis; 
yet, the critical question for small area estimation will be to 
obtain sample data that correctly represent differences 
among areas (except for sampling variability). This require- 
ment is especially important for regression analysis, but 
synthetic estimation must make equally strong assumptions. 

Timing of the Results 

Timing of the results leads to a complication similar in 
effect to the preceding issue. Optimistically, preliminary 
analysis of the PES and CPS aspects is possible by late 
1981, 5 but the ARM will trail more than a year behind. 
Preliminary estimates based only on the PES and CPS 
components are a possible and perhaps likely outcome; yet, 
these initial data may be improved by the results from the 
ARM a year later. This point, along with the preceding, tends 
to suggest that there will be more than one set of estimates 
of net undercount. 

CPS and PES: Character of the Basic Data 

The major objective of the coverage evaluation program is 
the estimation of the net error of the census figures. The 
survey procedures do not identify specific persons as "net 
missed persons"; rather, separate estimates of gross omissions 
from the census and erroneous inclusions in the census are 
obtained. Consequently, models that state the estimation 
problem in terms of a Bernoulli trial— in or out of the 
census— understate the complexity of the estimation. 

The complexity is compounded by the survey procedures 
that must be used in practice. For example, a single person 
may be treated as both missed from the census and 
erroneously enumerated if the census assigns the person to an 
enumeration district quite far from where the person lives. In 
this instance, the person is both omitted from the correct 
enumeration district and erroneously included in the enu- 
meration district assigned by the census. Unfortunately, 
estimation of the omission and erroneous inclusions come 
from separate samples, so that this hypothetical person may 
be possibly included in the sample estimate for one but not 
the other. Such aspects of the problem tend to increase the 
difficulty of satisfactorily modeling census coverage at the 
level of the person. 

CPS and PES: Nonlinearity 

The estimates of net census error from the CPS and PES 
surveys are nonlinear functions of the estimated erroneous 

' Under the revised plan, the number is three. 

5 The timing appeared as a critical consideration in eliminating the 
separate PES. Current (April 1980) projections suggest that the CPS 
results may be available by summer 1 981 . 


omissions and inclusions. This nonlinearity, in turn, will have 
an insidious effect on our ability to aggregate and disaggre- 
gate the estimates of net census error. This problem will have 
virtually no effect at high levels of geography and 
demographic detail, such as for estimates of net error for the 
total population by State, but may soon appear below that 
level. Considerable care will be required in using either 
regression or synthetic approaches to accommodate this 

ARM: Complexity of the Small Area Data 

The basic approach taken by ARM is to use a sample of 
persons matched to the administrative record series in order 
to estimate the rate at which persons of different demo- 
graphic characteristics appear in the series. The true national 
population may then be inferred from the administrative 
totals. (The actual process is somewhat more complex but 
conceptually equivalent.) 

The data from the ARM become progressively more 
complex as disaggregation is pursued. At some level, the 
estimation must incorporate the fact that the address present 
in a record series may not be the correct address for census 
purposes. Consequently, the available data form a complex 
matrix relating census and record addresses. Details of this 
estimation remain to be developed. 

Summary of Implications for Small Area Estimation 

The purpose of the preceding sections was to outline some 
of the current difficulties in the evaluation program and to 
touch on a number of considerations that should eventually 
be incorporated into a small area estimation program. 
Certainly, a common theme was the complex nature of the 
data to be obtained. Some points, such as the nonlinearity of 
the estimates, will require careful treatment in time, but for 
the interim may be replaced by simplifying assumptions. 
Other aspects, such as the recognition of the potentially 
complex relationships between omissions and erroneous 
inclusions in the PES and CPS surveys, appear to be 
prerequisites for the formulation of applicable models. 

Regression approaches to model county sample estimates 
as the unit of analysis, and synthetic approaches formulated 
in terms of characteristics of missed (or erroneously 
included) persons represent common approaches or starting 
points for more complex procedures that have frequently 
been applied to the problem of small area estimation. The 
preceding discussion should suggest that both methods will 
require considerable technical care to be adapted to this 
application. Regression techniques will be affected by the 
potentially high level of the variances, which may limit the 
complexity of the model and the direct evaluation of the fit 
of the model on the basis of the sample data. Biases in the 
sample data that are differential by geographic area may be 
partially mirrored by the regression estimates. The non- 

linearity of the estimation procedures will also have to be 
taken into consideration in forming the sample estimates for 
counties, since there is a potential for bias in the regression 
estimates from this source. 

Synthetic estimates will be similarly affected. High 
sampling variances may prove some limitation on the 
complexity of the synthetic estimates, but perhaps not to the 
same degree as for regression. Again, however, the amount of 
information provided by the sample data on the usefulness 
and reliability of the synthetic estimates will be restricted to 
about the same degree as regression estimation. Differential 
biases in the sample will affect the synthetic estimates, but 
here the key question concerns the extent of this bias by the 
analytic categories used to form the synthetic estimate. 


The preceding section identified a number of general 
methodological problems related to designing and producing 
small area estimates of census undercoverage. The emphasis 
was on obstacles and complexities in an effort to help other 
researchers incorporate these considerations in planning a 
program of small area estimation. This section will take a 
more constructive tone and discuss some possible ways in 
which empirical Bayes procedures might assist in the 

First, the allocation and distribution of sample cases for 
the PES and CPS will be reviewed for possible implications 
for specific small area approaches. Next, other aspects of 
modeling census undercoverage will be similarly explored. 
From this point, a proposed general strategy will be sketched 
for incorporating empirical Bayes ideas into the estimation. 

Summary of the PES Sample Allocation 

As it is currently envisioned, the PES sample allocation is 
designed to achieve a number of purposes. For the sake of 
discussion, the sample allocation may be thought to be 
determined by the following four steps: 

1 . Allocation of a basic sample size to each State in order 
to guarantee a minimum target reliability on the 
estimated percent net undercount for each State. 

2. Supplementation of the sample in the largest States in 
order to improve both regional and minority estimates. 

3. Supplementation of the sample in 32 large central 

4. Supplementation in the SMSA balances of the pre- 
ceding central cities in order to produce estimates for 
each of these SMSA's. 

In most States, the design will include the standard 
technique of selecting a first-stage selection of a sample of 
smaller counties and adjusting the selection rates within each 


to maintain a constant unconditional probability of selection 
for persons in the State. Not every county, therefore, will fall 
into the PES sample. 

As a consequence of both the allocation and the first-stage 
of selection, the number of sampled cases allocated to 
individual counties will vary in a complex manner. A 
particular group of counties, those containing the 32 central 
cities, will have a reliability almost comparable to that for 
small States. A few other counties, such as New Castle, Del., 
will also have a high level of reliability because they represent 
a large proportion of their total State population. From the 
few counties that will be quite reliably estimated, there is a 
range progressing downward to a very low level of reliability 
for most counties. Counties not containing the special cities 
but in their SMSA's will be among those in the middle to 
upper end of this reliability range. In general, outside the 
special SMSA's, the reliability of the county estimates tends 
to be in proportion to its share of the total State population, 
except that the first-stage of selection tends to put a lower 
floor on the reliability for small sample counties. Because of 
the original allocation to States, a relatively large county in a 
small State generally has a higher level of reliability than the 
same size county (in numbers of persons) in a large State. 

Summary of the CPS Sample Allocation 

The sample allocation of the CPS has evolved over time to 
satisfy a number of requirements for labor force data. 6 The 
current design generally follows purposes 1, 2, and 4 just 
enumerated, except that purpose 4 is satisfied for a some- 
what different list of SMSA's. Central cities in the CPS are 
not specially identified for supplementation except as part of 
their SMSA's. Like the PES, the CPS design uses a first-stage 
selection of counties in some States. 

Again, the CPS design by itself will yield a wide range of 
sample sizes for individual counties. The number of counties 
with highly reliable estimates will generally be somewhat less 
using the CPS by itself, but at least a few counties will have 
quite reliable estimates. 

Other Considerations in Modeling Net Census 

As a general rule, small area estimation techniques relate 
data available in great detail and precision to u variable (such 
as net census underenumeration) that is of interest but with 
data of much less detail or sampling reliability. Thus, the 
basic issue in the design of such estimates is the method of 
expressing these relationships. 

A synthetic estimation approach in this application would 
involve the cross-classification of characteristics measured in 

6 The E-sample allocation will probably parallel that for the CPS. 
Hence, the comments here about allocation of the CPS apply to the 
evaluation program as of April 1980. 

the census with census omissions and erroneous inclusions 
for persons in the evaluation samples. This procedure easily 
incorporates characteristics attributed to persons, especially 
age, race, and sex. It is possible, however, that a significant 
number of other measurable variables may also be associated 
with census undercoverage: a number of geographic charac- 
teristics such as region, SMSA status, and size of place, as 
well as a number of quality measures for the census such as 
census close-out rates. Such variables are ecological in nature 
and may relate general area characteristics to census under- 

It is the author's view that a simple synthetic approach 
cannot be readily adapted to incorporate a wide range of 
variables. Logistic regression for erroneous inclusions and for 
census omissions separately may hold some promise, but a 
more direct approach may be to apply linear regression to 
sample estimates of net underenumeration for sampled 
counties. This last approach, with a few methodological 
nuances, could include as an independent variable a pre- 
liminary synthetic estimate for the area based on 
demographic characteristics. Furthermore, it is this technique 
that lends itself best to empirical Bayes refinements. 

Empirical Bayes Possibilities 

Empirical Bayes estimation may be thought to include 
two basic notions: 

1. That, in order to estimate a quantity for a single unit, 
one may borrow information from similar units. 

2. That available sample data are allowed to shape, to an 
extent, the manner and degree to which data from 
other units are used. 

An earlier research effort (Robert E. Fay III and Roger A. 
Herriot, 1979, "Estimates of Income for Small Places: An 
Application of James-Stein Procedures to Census Data," 
Journal of the American Statistical Association, 74, pp. 
269-277) represents one of several illustrations of these ideas. 
To review this one example briefly, an empirical Bayes 
approach was employed to improve the average accuracy of 
estimates of income for small places. Sample data for these 
small places was provided from the 1970 census, but the 
small size of the sample in each of these places (with total 
population less than 1,000 persons) yielded only limited 
sampling reliability. The method consisted essentially of four 

1. Fitting a regression to the sample estimates using 
independent variables free or virtually free of sampling 
variability (such as per capita income figures for the 
entire respective counties or tax return data for the 

2. Evaluating the goodness-of-f it of the regression relative 
to independently established estimates of sampling 


3. For each place separately, weighting the regression and 
sample data together, considering the average 
goodness-of-fit of the regression and the sampling 
variability for the individual place. 

4. Applying a constraint to prevent the final estimate 
from substantially deviating from the sample estimate 
relative to the sampling variance. 

Thus, this procedure borrows information from other 
places through a regression analysis and determines the 
amount to which this information is used on the basis of the 
observed goodness-of-fit of the regression. 

Seemingly, these ideas could be directly transferred to the 
problem of estimating net census error. To do justice to this 
problem, however, a number of choices and potential 
difficulties must be addressed: 

1. The number of potential regression variables is much 
larger. Empirical Bayes procedures may be required to 
smooth estimates of regression coefficients over classes 

of independent variables. In some sense, an empirical 
Bayes procedure could in effect select a model by only 
slightly changing some classes of coefficients but 
drastically smoothing others. 

2. Evaluating the goodness-of-fit of a regression equation 
may hinge quite critically on the relatively small 
number of counties for which relatively precise sample 
estimates are available. It has been pointed out already 
that this is a fairly special group of counties. 

3. The question of geography may be allowed to enter in 
a complex way. It might be argued, for example, that 
county estimates should be adjusted to consistency 
with the sample estimate for the State. Sample 
estimates for major sub-State areas (say the metro/ 
nonmetro split) may also be used, but perhaps through 
some additional empirical Bayes procedure. 

Certainly, there are many possible paths to elaborate these 
basic ideas into a working procedure. Much will depend upon 
the sampling variances that are achieved by the evaluation 


Tommy Wright 

Union Carbide Corporation 

Since the paper by Dr. Fay was not available for review 
prior to the conference, my comments will be limited to the 
paper by Professor Kish and the paper by Professor Dempster 
and Mr. Tomberlin. 

One of the marks of excellent teachers is the value they 
place on a good foundation. That is, excellent teachers first 
emphasize the ideas, concepts, and definitions. This appears 
to be the theme of the paper by Professor Kish. 

Many times I suspect that simply because different terms 
are used to describe the same thing, we have several people 
getting the same results independently of each other. A 
classical example of this is perhaps the optimal allocation 
credited to Neyman [3] in stratified sampling which was 
published 1 1 years earlier by Tschuprow [5] . 

The five categories of missing units as listed by Professor 
Kish [(1) item nonresponse, (2) total nonresponse, (3) 
cluster nonresponse, (4) noncoverage, and (5) deliberate and 
explicit exclusions] appear to be sufficiently broad to cover 
completely everything one would want to consider when 
talking about various kinds of missing data. The extent to 
which one should be worried about missing data involves 
consideration of the consequences. Kish notes that the 
consequences tend to be worst when one is concerned with 
the estimation of simple totals. 

Uniform terminology and consistency of ideas and con- 
cepts are important considerations if several people are to 
communicate their ideas to each other with minimum 

The paper by Professor Dempster and Mr. Tomberlin 
introduces the concept of "the probability, p, that an 
individual will be enumerated in the census." The proposed 
method of attack is a PES with a multistage design. The 
probability, p, is a function of several factors, including 
factors that consider geography, facts that consider social 
characteristics, etc. One seeks an estimate for the under- 
count, U, and uses ratio estimation of the type 

u ^c 


where p is an estimate for p(q = 1 - j&V, C is the number of 
people in the PES who were counted in the census, and U is 
the estimate of the undercount. Ways of estimating U at 
various levels and for various groups of interest are discussed 
with the use of a logistic model 

logit(p) = log— = 9 + sum of other parameters (2) 

where 8 is fixed. 

The other parameters are fixed in some cases and 
permitted to vary in others. (The model seems to permit 
much flexibility.) When the parameters vary, prior distribu- 
tions are assumed and discussions involving Bayesian 
procedures and posterior distributions surface. Attention 
centers on estimation of small areas. The authors give a 
sketch of their plan and acknowledge that research is needed 
to fill a number of gaps. But they feel reasonably confident 
that the gaps can be filled and cite a number of recent 
articles that support their claims. 

The Bayesian framework is not totally clear in the usual 
sequence of parameter, prior distribution, loss (though 
quadratic loss is implied), observations dependent on param- 
eter, and posterior distribution. Specifically, is there a prior 
on0 in (8)? 

I agree that there is great hope in "matching studies" and 
we should note the planned use of this technique for the 
1980 census (also earlier censuses) as noted in the report, 
"Counting the People in 1980: An Appraisal of Census 
Plans." However, it appears that the type of matching study 
need not be limited to the postenumeration survey type as 
suggested by Dempster and Tomberlin and seemingly favored 
by the Census Bureau. One might also consider a pre- 
enumeration survey or a procedure where various kinds are 
unknowingly "captured and tagged" (only on paper) either 
by a purposive process or by some randomization process 
independent of the census enumeration process. The-possible 
noncoverage of important subgroups or undercount of the 
PES that Dempster and Tomberlin alude to in their paper 
might be avoided with such a technique. (We have reports 
from the Bureau that there appears to be a tendency for the 
PES to miss the same groups as the census.) Of course some 
sort of controlled selection is no doubt used in the PES to 
decrease the probability of noncoverage of important sub- 
groups. Considering the huge amount of available data before 
each census, it seems only natural that one would want to 
consider the undercount problem in the Bayesian setting. 

The thoughts of Dempster and Tomberlin remind me of a 
problem of recent interest to me in Health and Safety 
Research, which I will briefly mention below using two 
slightly different models. (See Wright [6] and Bratcher, 
Schucany, and Hunt [1] ). 

Model I. We are given a finite population of size A/. After 
the census, we assume that the number of people missed is 
M. (N = N ' + M, where N c is the number counted in the 
census). Suppose that we have prior feelings concerning the 


true value of M and that it is expressed by using the 
beta-binomial prior distribution 

f x (M\a$,N) = 
f or M = 0, 1 , . . . , N, where 



a.0)= f t a ~ l d-f)' 3 " 1 


for a > and (3 > 0. 

The values of a and )3 are chosen to represent the state of 
prior knowledge about M. This family is suitable because it is 
"rich" and the mathematics is tractable. We next take a 
random sample PES of size n and observe that m of the n 
were missed in the original census. It is routine to show in 
this setting that the posterior distribution of M given the 
sample is also beta-binomial 

f 3 (M\m 

- (^ ) 



f or M = m, m + 1 N - n + m. 

An estimate of M can be found as follows. Let 

* m + a. 

M = E{M\m) = (N-n) 

n +a + (3 
Substituting N '+ iQ for N and solving for M gives 

a m + a 


c n -m+j3 

(Recall U = %-C.) 

Model II. We are given a finite population of size A/, we 
believe. The population contains an unknown number of 
people in category 1, say Nx . Assume a beta-binomial prior 
on N x of the form 

f l (N l \ct l ,& l ,N) 


g(/Vi+g 1 ,/V-/V 1 +l3 1 ) 
B{a lf M 

for AT, =0,1, N. 

To find A/j (i.e. estimate N x ), we take a census and 
observe n people altogether of which n y are observed to be in 
category 1. We look upon then people observed as a sample, 
and in particular a random sample. (Perhaps a "Bernoulli 
Sample" is more appropriate (see Strand [4] ). After the 
census, the overall undercount is /V - n and the undercount 
for category 1 is N x - n x . (That is, the number of persons 
missing are M=N~n and M x =N l -n). The posterior distribu- 
tion for Ni given the sample is 

f 3 (N 1 \n 1 ) 

(N-n \ 

B(n l +a 1 /i-n l +(} i ) 

for /Vj = A7i,/7, + 1 N - n + n x 

Now given that we observe A7j people in category 1 in the 
census (random sample) of size n, the probability that there 
are exactly N/^ people in category 1 in the uncounted part 
of the population is 

P(N 1 =N /L +n l \n l ) 

(N-n\ B{N, / +n 1 +a 1 ,N-N // -n 1 +p l ) 

A7 1 |ni)=( J — — 

VV /Z _/ B(n i +a 1 /7-n 1 +^ 1 ) 

= h (N jL + 

The above models are presented here merely as alterna- 
tives of possible initial considerations when considering 
models of the undercount problem. Indeed, the assumptions 
of model 2 are suspect, for example, to assume that the 
people enumerated forms a random sample of the total 
population. However, it might be a starting point. One 
advantage of Model 2 is that it can easily be extended to 
determine the probability of missing anyone in k categories 
and can give estimates for M\ , Mi, . . . , M^. We would take a 
Dirichlet-Multinomial as our prior distribution. 

I would like to close with the following comments. 

(1) It is interesting to note that the law dictates a 
complete enumeration and objects to sample estima- 
tion. However, the final numbers are adjusted figures 
due to sampling. In the Kish [2] spirit, perhaps 
the lawmakers should ask, in this "Age of Survey 
Sampling," what do we want to accomplish? After 
this issue is settled, one can ask how to accomplish 
these goals. It may mean a complete census is still 
necessary; it may mean that sampling might be 
sufficient; or it might mean that one can make use of 
censuses and surveys in the enumeration process. 

(2) It would also be of interest to consider whether a 
complete census every 10th year might not be in 
some way equivalent to one- tenth of a census every 
year for 10 years, or to one-half of a census every 
fifth year for 10 years, etc. . . . 


1. Bratcher, T. L, Schucany, W. R., and Hunt, H. H. 
"Bayesian Prediction and Population Size Assumptions." 
Technometrics, 13 ( 1 971 ) , 678-681 . 

2. Kish, L. "Samples and Censuses." International Statistical 
Review, 47 (1970), 99-109. 

3. Neyman, J. "On The Two Different Aspects of the Repre- 
sentative Method: The Method of Stratified Sampling 
and the Method of Purposive Selection." Journal of the 
Royal Statistical Society, 97 (1 934), 558-606. 


Strand, M. M. "Estimation of a Population Total Under a Correlated Observations," Metron, 2 (1923), 461-493, 

'Bernoulli Sampling' Procedure." The American Statis- 646-683. 

tician, 33 (1979), 81-84. 6. Wright, T. "More Tables for the Probability of Missing 

Tschuprow, A. A. "On the Mathematical Expectation of Hot Spots With Skewed Priors," Internal Memo, Union 

the Moments of Frequency Distributions in the Case of Carbide Corporation, Nuclear Division, 1980. 

Floor Discussion 

A question arose concerning clarification of a variance 

that Dr. Fay mentioned being increased by a factor of 6 to 

20. There was discussion of the magnitude of the variance, 

since it may shed some light on how difficult the problems 

are that are being discussed. Dr. Fay responded that as a first 

cut, instead of using_J— where p is the census undercoverage 

rate and n is the number of sample persons— as a variance of a 

sample estimate of missed persons, the average number of 

persons per household was used as a design effect so that 

instead of dividing by the number of persons, division was by 

the number of households to approximate the variance. 

However, because of the area-sampling nature of the PES, 

because a problem can occur within a block (such as the 

complete miss of a block in the census), and because the 

missed rate includes both erroneous omissions and erroneous 

inclusions that are both very large and not highly enough 

correlated to offset each other, the variance is greatly 

increased. This was determined by taking census pretest 

results and simulating the variance that would be obtained 

for a sample designed to take a probability proportional to 

size-sample of blocks with subsampling within blocks for an 

equal-weighting sample. It was found that the design effect in 

these data meant that instead of a measure of the coefficient 

of variation for the corrected population of 0.3 percent at 

the State level, it might be roughly in the neighborhood of 

1.0 percent, which may yield statements for estimates of 

undercounts for States that could be as large as 2 percent, 

plus or minus. This is somewhat disheartening. Whether this 

situation will reoccur in 1980 is questionable. 

Further clarification of Dr. Fay's point indicated that by 
doing computations using robust methods of estimating the 
variance of the results, the variance, instead of being — 3-, was 
6 to 20 times that big. 

It was questioned whether this meant that the PES is 
anticipated to be useless. The Census Bureau responded that 
this is only a variance consideration. If the same results 
materialize in 1980, the Bureau will have much bigger 
variances than it had hoped to obtain, but that may not 
make the data useless. It was emphasized also that these 
concerns apply only to the PES. The Bureau does not have a 
similar evaluation as to what may be expected from the CPS, 
which may turn out better as it is less clustered. 

Built-in overall surveillance that protects against the 
effects of extreme outliers or wild values was also suggested. 
The Bureau indicated that this could be done, except that 
perhaps the direct sample estimate is the most direct way of 
representing what really happenas nationally; that is, with 

a very large sample, one can even admit into the estimate the 
effects of those outliers. It was felt that perhaps models 
should be made to handle outliers subnationally, particularly 
when State estimates are derived, as one might not be able to 
accept the effects of such things at a State level. 

The group inquired as to whether or not the sampling of 
census records for the administrative records match will be 
on a block level so that entire blocks will be sampled. If so, it 
may be more feasible for the estimates to be in terms of the 
proportion of the block missed rather than the likelihood of 
the individual being missed. Then blocks could be looked at 
in terms of the probability of major problems occurring. 

Although the value and future of the PES had been 
questioned earlier in the discussion, it was suggested that 
some form of data file be constructed that can be used by 
outsiders. It was felt that a serious difficulty in the research 
into the undercount has been that much of the data has been 
unavailable to outsiders. It was concluded that many of the 
suggestions made at the conference, in fact, would have been 
improved if people had been able to conduct some explora- 
tory analysis on the data. It was recommended that some of 
the work done in 1970 be made into a public use file, 
particularly matching work done with administrative records 
involving the CPS, which was not finished in a form that 
could be made public. 

Given the life expectancy of administrative records, little 
problem with data confidentiality should be experienced 
now with those types of data, provided that geographic 
identification is still restricted. This might be of particularly 
great potential, since any undercount adjustment made 
should be consistent with administrative systems, given the 
heavy use made of administrative records by policymakers. 
Multisystem matching at least on a sample basis also was 
reinforced (e.g., Medicare, Social Security, Employment 
Security, Internal Revenue Service). 

There may be some difficulty experienced because the 
populations covered by the files do not overlap sufficiently, 
which creates difficulty in multiple-system estimation even- 
though sampling is large; that is, there is a possibility for 
large sampling errors, but also large biases if the underlying 
assumptions are not met, as they might not be. However, 
having these types of data available at a very detailed 
geographic level over many years could make significant 
contributions to some of the synthetic procedures that are 
goingto be applied. 

The nonlinear form of the synthetic estimates introduced 



by Dempster and Tomberlin also was supported. This was 
thought possibly to be a partial answer to the problem 
referred to by Dr. Fay— the fact that synthetic estimates may 
collapse when one tries to combine them due to the 
nonlinear nature of the undercount. 

It was also speculated that some of the problems with the 

variance referred to by Dr. Fay may not point to the 
necessity to go to an indirect synthetic or regression-type 
estimate even for the areas where there are data for making 
direct estimates, or possibly a Stein- James type of combina- 
tion, which would give, in this case, very low weight to the 
direct estimate because of its high variance. 

Impacts of 

The Impact of Census Undercoverage 
on Federal Programs 

Courtenay M. Slater 
U.S. Department of Commerce 

The widespread interest in statistical adjustment of 
decennial census data stems in large part from the conviction 
that differential population undercoverage produces serious 
inequity in the administration of Federal programs, 
especially programs which distribute funds in accordance 
with statistical formulas. Since many billions of dollars are 
distributed each year through such formula grant programs, 
the concern is understandable. Yet the kind of compre- 
hensive evaluation of the impact of census undercoverage on 
Federal programs needed for an informed decision on 
adjustment of the undercount is lacking. 

In order to initiate structured discussion and investigation 
of the Federal program impacts of possible adjustment for 
census undercoverage, members of the interdepartmental 
Statistical Policy Coordination Committee (SPCC) recently 
were asked to respond to the following questions: 

1. How would statistical series prepared by Federal 
agencies be affected by adjustment of the census data? 

2. What Federal fund allocations or other program 
administration activities would be affected and to what 

3. Assuming adjusted data would become available in 
1983, what problems would be created by: 

(a) use of preliminary (unadjusted) data during the 

(b) shifting to revised (adjusted) data when it becomes 

4. In addition to population totals, what other 
characteristics (e.g., income, employment, age, race, 
sex, ethnic origin) would be required on an adjusted 
basis for Federal program administration? What would 
be the impact on the agencies of having some data 
available on an adjusted basis and other data available 
only on an unadjusted basis? 

5. What degree of geographic detail would be required for 
adjusted data used in Federal program administration? 

Drawing on agency responses to this request as well as on 
earlier research work done at the Census Bureau and 
elsewhere, this paper attempts to identify some of the 
Federal program considerations which should enter into 
decisions on whether corrections for census underenu- 
meration should be made and, if so, how they should be 
made statistically. 


First, what are the statistical programs which are affected 
by census undercoverage and what is the nature of the 
effects? The affected statistical programs are not limited to 
the census data itself. Other statistical programs are affected 
in at least three important ways: Through use of census data 
to derive postcensal estimates, through the use of census 
figures as a sampling frame and as the basis for control totals, 
and through use of census data as the denominator in ratio 

The Census Bureau's population and per capita income 
estimates illustrate the use of census data to derive postcensal 
estimates. The monthly Current Population Survey (CPS) 
illustrates the use of census counts as a sampling frame and as 
the basis for control totals. Vital statistics (birth and death 
rates) illustrate the use of census population totals as a 
denominator. Per capita income estimates are another 
example of the use of population as a denominator. 

Postcensal Estimates 

Since the inception of the general revenue sharing 
program in 197 ^, the Census Bureau has prepared local area 
population and income estimates biennially for use in 
distributing these funds. Data from the 1970 census serve as 
the starting point for these estimates, and the estimates are 
affected both by 1970 population undercoverage and by 
underreporting of income among those who were counted in 
the 1970 census. 

Current Population Survey 

The CPS, a monthly Census Bureau survey of 71,000 
households, is the source of a wide range of current data on 
household characteristics. Most importantly for the present 
discussion, it is the source of monthly national data on labor 
force employment and unemployment, it is a crucial 
component of local area employment and unemployment 
estimates, and it is the source of widely used annual 
estimates of individual and family income and poverty. 
Postcensal estimates consistent with the decennial census 
provide control totals for the CPS, and census undercoverage 
leads to similar undercoverage rates in the CPS. 



Vital Statistics: Per Capita Income Estimates 

Birth, marriage, divorce, and death rates are computed by 
the National Center for Health Statistics (NCHS) as the ratio 
of registered births, marriages, etc., to population totals 
provided by the Census Bureau. With one exception, these 
statistical series make no adjustment for census under- 
coverage. Per capita income estimates prepared by the 
Bureau of Economic Analysis (BEA), which similarly utilize 
census population data, have widespread program uses at all 
levels of government. The income estimates are obtained 
from sources independent of the census, largely adminis- 
trative records. These income totals then are divided by 
population figures which do reflect census undercoverage, 
causing an overstatement of income per capita. 


Evaluation of the impact of census undercoverage on 
Federal funding programs must take into account the use of 
census data in preparing other statistical series, as illustrated 
above, as well as the direct use of census data in program 
administration. Some of the major program uses of the 
affected statistical data are described below. In some cases 
the impact of undercoverage can be illustrated using past 
data; in others, no studies illustrating such effects have been 

Revenue Sharing 

The program whose reliance on census data is best known 
and most thoroughly studied, undoubtedly, is general rev- 
enue sharing. Under this program, nearly $7 billion will 
be distributed to State and local governments this fiscal year 
based on a formula utilizing population, per capita income, 
and tax effort. Studies by Jacob Siegel and others have 
demonstrated that income undercoverage has a far greater 
impact on the distribution of revenue-sharing funds than 
does population undercoverage. This is true of funds dis- 
tributed to the States (1/3 of the total) and even more 
pronounced with respect to funds distributed to localities 
(2/3 of the total). Siegel's 1975 study indicates that 
adjustment for income underreporting would have shifted 
revenue-sharing allocations by 3 percent or more for 10 
States. In contrast, adjustment of the population count 
would not have affected any State by as much as 3 percent. 

A 1979 study by Siegel and Robinson examines impacts 
of adjustment on localities in Maryland and New Jersey. The 
dominance of the income factor is even greater in the 
formula governing distributions to localities. The authors 

. . . per capita income is the dominant element among the 
various elements in the revenue-sharing formula at the 

sub-State level and has a superordinate. effect when data 
errors in the formula are corrected. Large shifts in funds 
among counties and local areas result when the per capita 
income factor is corrected. . . . The transfer of funds 
among areas that is realized when the population compo- 
nent is adjusted for underenumeration is small by 
comparison. If the cause of equity is to be served in the 
distribution of revenue-sharing funds to local areas, it may 
be more important to develop and apply accurate 
corrections for the data on income than to apply 
corrections for the population counts. 

Several points are worth noting: 

1. In these studies, the income data were adjusted for 
both the underenumeration of the population and the 
underreporting of income by those who were counted. 

2. Adjustment for both population and income caused 
little increase in the shift of apportioned funds, 
compared to adjusting income alone, especially at the 
local level. Funds allocated to localities in Maryland 
and New Jersey would have been shifted by an average 
of 9 percent by this combined adjustment, but the 
overwhelmingly important sources of the adjustment 
are the income factors in the formula. 

3. Results vary depending on the method utilized to 
adjust for the undercount. Most of the results 
described here are based on a simple synthetic adjust- 
ment. Tests of different adjustment methods at the 
State level in some cases yield fairly large differences in 
the results. Alternative undercount adjustment 
methods at the local level have yet to be evaluated. 

4. Since a fixed funding total is assumed, the amount of 
money withdrawn from some governments by defini- 
tion equals the increase provided others. However, in 
the Maryland-New Jersey study, the number of local 
governments losing funds consistently exceeded the 
number of gainers. In Maryland, the number of losers 
outnumbered the gainers by 4 to 1; similarly, but less 
dramatically, 56 percent of local governments in New 
Jersey would have been losers. It also seems to be the 
case that more often than not the losing localities 
represent a larger fraction of total population than do 
the gainers. 

Conclusions about which one might speculate based on 
these studies are: 

1. Adjustment of population counts only appears to be of 
limited value. Adjustment of income clearly seems 
more important, but is of greater complexity because 
independent sources must be found to adjust for 
income underreporting by those covered by the census. 

2. The political acceptability of adjustment appears in 
doubt, since losers outnumber gainers. 


3. Equity in the allocation of revenue-sharing funds might 
more readily be achieved by revising the formulas 
rather than adjusting the census. 

4. Finally, discussion of the basic merits of revenue 
sharing lies outside the scope of this paper, but it is 
worth noting that the present authorization expires in 
September of this year, and the future of the program 
is not entirely certain. 

Programs Relying on the Current Population Survey 

As indicated above, the CPS is a source of labor market 
information and information on income and poverty. Both 
types of data are widely used in program administration. 
Fund allocations based in part on CPS employment and 
unemployment data include: 

• General training and employment programs, budgeted 
at $2.8 billion in 1981 

• Public service employment, $4.4 billion 

• Youth employment programs, $2.6 billion 

• Countercyclical and targeted fiscal assistance, $1.0 

Data on the prevalence of poverty or low income also 
enter into the allocations for some of the above programs. 
Other programs utilizing household income data from the 
CPS as an element in determining fund allocations include: 

• Community development block grants, budgeted at 
$3.8 billion in 1981 

• Urban development action grants, $0.4 billion 

• Low-income energy assistance, $2.4 billion. 

These lists are not exhaustive, but they are sufficient to 
illustrate that we are concerned with a wide variety of 
programs and that large sums of Federal monies are involved. 
It is not possible to cover here all the ramifications of the use 
of CPS data in each of these programs, but several points 
may be noted. 

1. In general, each program uses a different set of CPS 
data, and the data enter into the formula in different 
ways. Some use unemployment rates, some use 
absolute numbers of employed or unemployed, some 
use the number of persons below the poverty level, 
some use multiples of the poverty threshold, and so 

2. Typically, these programs require data estimates for 
small places, in some cases for each of the Nation's 
39,000 units of general government. 

3. The local area employment and unemployment data 
used in administering the various employment and 
training programs are obtained only partially from the 
CPS. However, the other data sources used in these 

programs also rely heavily on the decennial census as a 
starting point. The Commissioner of Labor Statistics 
states that "It is no exaggeration to say that every 
series published in the Local Area Unemployment 
Statistics program would be affected by [census 
undercoverage] adjustment." 

Even so, because the uses of the data are so varied, the 
effects of census undercoverage are difficult to estimate. The 
use of the census as a sampling frame is not thought to be the 
source of any major CPS error, since various procedures are 
followed to adjust the sampling frame. The use of census- 
derived population estimates as control totals for interpreting 
the survey results does cause census undercoverage to affect 
CPS estimates, however. 

Some examination has been made of the effects of census 
undercoverage on national employment levels and rates. 
These studies indicate that, although the number of persons 
identified as employed and unemployed would be signifi- 
cantly increased by a synthetic adjustment for census 
undercoverage, unemployment rates would be affected very 
little. Johnston and Wetzel's 1969 study estimated that the 
national unemployment rate in 1967 would have risen only 
from 3.8 to 3.9 percent through correction for census 
undercoverage. This result held, whether the uncounted were 
assumed to have the same labor force status as the average 
for their age, sex, and race, or whether they were assumed to 
have the characteristics of those of the same age, sex, and 
race living in urban poverty neighborhoods. More recent 
Census Bureau calculations have also shown little change in 
the unemployment rate when it is corrected for census 
undercoverage. On balance, the uncounted fall into age, sex, 
and race groups having unemployment rates higher than the 
published national average, but they are too small a 
percentage of the total to have much effect on the national 
unemployment rate. 

It may, of course, be argued that those who are not 
counted in the census have even higher unemployment rates 
than the average residents of poverty neighborhoods, but, 
not having managed to count these people, we have no data 
with which to test this assertion. 

It may also be argued that it is not the national average 
unemployment rate, but the rates for various age, race, or sex 
categories which are of concern. Surprisingly, the available 
studies suggest that correction for census undercoverage may 
be slightly lower than the national unemployment rate for 
blacks. This is because undercoverage is especially high for 
adult black males, a group with a lower unemployment rate 
than that for all blacks. Intuitively, one resists this result. 
Surely, the uncounted have higher than average unemploy- 
ment rates, but test results show that the census under- 
coverage rate for black employed is higher than for black 
unemployed. Intuition is sometimes a dangerous guide. 

It might further be argued that national data are of little 
interest in the present context; it is accurate measurement of 


regional variation which is of concern. Certainly, the degree 

of regional variation is of concern. Very little information 

about it is presently available, however. 

Use of household income and poverty data from the CPS 

raises a new set of complications. Poverty is a household 

rather than an individual concept; determination of whether 

a household is below the poverty line depends on household 

size and household income. Estimates of census undercoverage 

have focused on the number of uncounted individuals. In 

order to examine the effect of undercoverage on poverty 

estimates, assumptions or imputations would be necessary 

regarding characteristics, not of the individuals themselves, 

but of the households in which they reside. A large part of 

census undercoverage stems from failure to count entire 

households. Presumably, one could make some sort of 

imputations about the characteristics of these households. 

However, it would seem quite risky to assume, without a 

great deal more investigation, that such arbitrary imputations 

would, in fact, produce either more accurate information or 

greater equity in the allocation of funds. 

What conclusions or speculations can one draw from all 


1. The CPS, heavily reliant on decennial census data for 
its design and execution, is a major source of statistics 
used in the allocation of Federal funds. Its combined 
uses total a great deal more money than is involved in 
the general revenue sharing program. 

2. The extent of the error introduced into the CPS, 
because of census undercoverage, can be and has been 
at least roughly measured. However, little attention has 
been focused on the geographic distribution of the 
error nor on the implications for Federal programs. A 
great deal more research is in order. 

3. As was the case with the general revenue sharing 
program, it is not just accurate population counts 
which are important. One must also know— or be able 
to estimate— characteristics of the uncounted; not only 
race, age, and sex, but employment status and some- 
thing about the size and income of the households in 
which they live. It is hoped that the postcensal 
evaluation program, planned in conjunction with the 
1980 census, will provide information which will yield 
some clues regarding these characteristics of the 
uncounted. At the moment, however, we do not have 
the information necessary to impute characteristics 
such as these with any useful degree of accuracy. 

Programs Utilizing Per Capita Income Estimates 

As noted, census population counts and postcensal 
population estimates based on these counts are widely used 
as denominators for computing other statistical series. The 
per capita income estimates prepared by the Bureau of 
Economic Analysis are a leading example of such a use. 

These per capita income statistics are utilized in admini- 
stering a multitude of Federal programs. The BEA has 
identified fund allocation uses which totaled $29 billion in 
fiscal 1979. This is not necessarily an exhaustive list. The 
typical— though not the only— purpose of using these esti- 
mates is to identify the prevalence of low-income popula- 
tions in need of various types of Federal assistance. 
Examples of the uses include: 

• Educational programs (vocational education grants, 
library grants, and others), obligations of $1.5 billion in 
fiscal 1979 

• Public Health Service programs, $0.5 billion 

• Medicaid, $11.8 billion 

The Medicaid program is an example of several of the 
largest Federal assistance programs which (on either a 
required or an optional basis) utilize BEA's State per capita 
income estimates to determine the matching requirement for 
the State's contribution to programs. 

The BEA develops the per capita income figures by first 
estimating total personal income and then dividing it by 
population estimates supplied by the Census Bureau. The 
personal income estimates are derived from sources inde- 
pendent of the population census; about 90 percent of the 
information comes from administrative records such as 
unemployment insurance, social security, and income tax 
records. Although the data sources contain various imperfec- 
tions, their coverage is much more complete than income 
estimates obtained from census or household survey data. 
Dividing these data by a population estimate that reflects 
census undercoverage has the effect of overstating income 
per capita. Geographically, the extent of overstatement may 
be assumed to be approximately proportional to the degree 
of census undercoverage in the particular geographic afea. 

The BEA has made considerable effort to identify the 
ways in which Federal programs use these per capita income 
estimates. This in itself has not been easy. There is no 
systematic process by which data users notify data producers 
of their interest in and use of data. Hence, one can never be 
sure that all the uses have been identified. 

As far as I can discover, no studies have been made which 
attempt to trace through the impact of census undercoverage 
on programs using the per capita income data. This would 
seem to be an area crying for investigation. 

On can hypothesize that census undercoverage is greatest 
in low-income areas; that overstatement of per capita income 
therefore is greatest in low-income areas; and that the 
programs designed to reach low-income populations, there- 
fore, contain a systematic bias against distributing funds 
where they are most needed. 

A hypothesis is a proposition to be tested; it is not a 
conclusion. The fact that the above hypothesis sounds 
plausible does not make it correct. There is evidence that 
census undercoverage is greater in low-income areas. How- 


ever, the remainder of the above hypothesis has not been 
tested. It cannot be assumed that it is correct. The studies 
which have been made of the impact of undercoverage on the 
revenue-sharing program and on unemployment rates have 
yielded results which sometimes conflict with earlier intuitive 
assumptions. Study of the effect of census undercoverage on 
programs using BEA per capita income data is appropriate; 
conclusions about who, if anyone, is being shortchanged 
would be premature. 


My first reaction in attempting to structure an examina- 
tion of census undercoverage on Federal programs is a feeling 
of overwhelming confusion. There are a multitude of 
programs; each one is different; each one is complicated. 
More investigation and more thought are needed before 
forming a judgment as to whether adjustment of census data 
for purposes of program use is feasible or would contribute 
significantly to the achievement of program goals. 

There are certain basic points that should be kept in mind 
as study and discussion go forward. 

1. We need to be concerned, not just with population 
counts, but with socioeconomic characteristics of the 
population. For obtaining reliable information on 
population characteristics, statistical adjustment is no 
substitute for an effort to obtain as complete a census 
count as possible. I stress this obvious point because it 
has from time to time been suggested that the 
Government could save a good deal of money by 
making less effort to achieve a complete count and 
placing more reliance on statistical adjustment. If we 
needed only a population count, this might be correct. 
However, program uses of census data demand know- 
ledge of population characteristics very difficult to 
impute accurately to uncounted people. The Census 
Bureau has been correct in placing its primary 
emphasis on obtaining improved census coverage. None 
of us should allow ourselves to be distracted from 
efforts to assist in achieving this. 

2. We are concerned not just with direct use of census 
data or even with direct use of postcensal estimates. 
The census data enter into other statistics used in 
Federal program administration. Although these uses 
are well known, there has been little examination of 
the impacts of census undercoverage. To focus only on 

the obvious direct uses of census data is to fail to see 
the forest for the trees. Such limited vision could lead 
to poor decisions on the question of undercount 

3. In reality, the impacts of undercoverage may be 
different than we think. More accurate data is a 
desirable goal in and of itself, and adjustment for 
undercoverage certainly deserves the study and atten- 
tion it is receiving on that ground alone. However, 
those whose concern about undercoverage stems from 
the assumption that those for whom they speak are 
being shortchanged on Federal program assistance 
might be well advised to look more carefully at the 
extent to which this is or is not true on a compre- 
hensive basis. 

4. Finally, while data producers must strive to meet the 
needs of data users, data users also have some 
responsibility to try to avoid impossible demands. 
Many of our funding programs demand degrees of 
detail in terms of geography and of population 
characteristics which cannot be reliably produced by 
any known method. Also, programs come and go and 
change their design with considerable rapidity. The 
broad design of data-collection programs should 
respond to user needs, but it would scarcely be good 
policy to build every detail of census data collection 
around continuously shifting and sometimes poorly 
conceived user demands. Along with considering 
undercount adjustment to meet program needs, we 
also should consider redesigning program data require- 
ments to conform to the data reasonably likely to be 
available from a good data-collection program. 


Johnston, Denis F., and Wetzel, James R. "Effect of the 
Census Undercount on Labor Force Estimates." Monthly 
Labor Review, March 1969, pp. 3-13. 

Robinson, J. Gregory, and Siegel, Jacob S. "Illustrative 
Assessment of the Impact of Census Underenumeration 
and Income Underreporting on Revenue Sharing Alloca- 
tions at the Local Level." Paper presented to the 
American Statistical Association, August 1979. 

U.S. Department of Commerce, Bureau of the Census. 
"Coverage of Population in the 1970 Census and Some 
Implications for Federal Programs. "Current Population 
Reports, Special Studies, Series P-23, No. 56, August 1975. 

The Impact of the Undercounton State and 
Local Government Transfers 

Herrington J. Bryce 

Academy for Contemporary Problems 

Billions of dollars from the Federal Government are distri- 
buted annually among State and local governments on the 
basis of their population size. In addition to Federal funds, 
State governments also distribute revenues to their localities 
on the basis of population size. Although there are no cur- 
rently precise estimates, it is accurate to conclude that liter- 
ally tens of billions of Federal and State dollars are distri- 
buted on the basis of population. 

This paper considers the impact of a census undercount 
on this distribution process. It looks at some specific pro- 
grams, identifies potential losers and gainers, and analyzes 
the equity of readjustment of the census for the undercount. 


Population is the most often used factor in formulas for 
the distribution of Federal assistance to State and local 
governments. More than half, 83 of 146, of the Federal cate- 
gorical programs use some form of population as a factor 
in their distribution formulas. Total population is used in 28, 
while some specific segment of total population is used in 
the other 55. Population-based programs cover a wide range 
of government functions including education, housing and 
community development, criminal justice, employment and 
training, social welfare, transportation, water, and sewer [1 ] . 

In addition to appearing in distribution formulas, popula- 
tion also determines the eligibility of a jurisdiction for assis- 
tance. In the Comprehensive Employment and Training Act 
of 1973 (CETA), for example, State and local governments 
with a population of at least 100,000 qualify as prime spon- 
sors. Smaller jurisdictions may qualify as a group numbering 
100,000 in population, or if specifically authorized by the 
Secretary of Labor. 

In some programs, population plays a role in determining 
distribution because the variables used are derivatives of 
population. This is true of those formulas that use per capita 
data or local employment or unemployment rates. For cities 
and counties, these rates are determined on the basis of 
population. Thus, while CETA titles II, IV, and VI do not 
explicitly include population, the allocation of funds through 
these titles is affected by the accuracy of the population 
count as well as the count of the low-income population, 
which is a factor in the allocation of titles II and IV funds. 

Population and State Distribution of Aid 

Population also plays an important role in determining the 
distribution of funds from States to their localities. Some 

State aid programs specifically require the use of census pop- 
ulation counts. This is the case for the cigarette, gas, and 
liquor taxes in the State of Oregon. In other cases, however, 
alternative data may be used. Again, in the State of Oregon, 
State revenue sharing is based on population data provided 
by the demographic unit at Portland State University. Those 
data are certified by the State and are used to distribute 
revenue-sharing funds among the localities. 

Where alternative data are admissible, the impact of the 
undercount might be diminished. Yet, it is to be understood 
that even by using their own updates of population, jurisdic- 
tions do not thoroughly escape the impact of the census 
count. The State of New Jersey has estimated its population 
by jurisdiction for 1977 and 1978, for example, using 
alternative methods. The housing-unit method of estimation 
it uses is very much grounded in the 1970 census count, even 
though these figures are adjusted. Often the census count is 
the beginning point or the control used in alternative esti- 
mates by the Federal Government, jurisdictions, or private 

Education accounts for 60 percent, highways for 6 per- 
cent, and general government for 10 percent of State aid 
to local governments [11]. From afunctional point of view, 
general local government support, followed by aid for high- 
ways, are the areas in which the dollar volume of State assis- 
tance to local governments is most tied to total population 
size. These are followed by education, which, although 
accounting for nearly 60 percent of State aid to localities, 
has its aid determined not on the basis of total population 
but on school-age population. 

Highway aid from States to local governments is sensitive 
to population counts, in part, because of the way aid based 
on receipts from the motor vehicle fuel sales tax is distri- 
buted. The majority of the States with such a tax distribute 
the receipts by using population as one of the factors in the 
formula. Exceptions to this rule include the States of New 
York, Louisiana, Maryland, and Virginia. 

The views expressed are solely those of the author and 
do not necessarily represent the views of any organiza- 
tion with which he is associated. The author wishes to 
thank Deborah Washington, Gwendolyn Cunningham 
and Evelyn Harvey for typing this manuscript and 
acknowledges the assistance of the Charles F. Kettering 
Foundation (National Urban Roundtable), the 
Academy, the NAACP, and the Census Bureau. 



The sensitivity of general local government support to the 
population-size factor is often a reflection of how some 
consumption or nuisance taxes are distributed. In Maryland, 
receipts from the parimutuel and cigarette taxes are distri- 
buted to counties and the city of Baltimore on the basis of 
population. Louisiana distributes receipts from the cigarette 
tax also on the basis of population. Minnesota, Nevada, 
Oregon, Indiana, and Kansas distribute revenues from this 
tax as well as the alcoholic beverage tax on the basis of pop- 

The alcoholic beverage tax is also used for general support 
of local governments and distributed on the basis of popula- 
tion in the States of Indiana, Alabama, North Carolina, 
South Dakota, Oklahoma, Rhode Island, South Carolina, 
Tennessee, Utah, Virginia, Wisconsin, and Washington. 

Adequacy of the Population Variable 

The use of population as a factor for distributing Federal 
funds to State and local governments as well as from States 
to their localities is based on a number of assumptions. It is 
assumed that population size is an indicator of cost. There is 
some strong empirical evidence to suggest that cost is related 
to population size of government. However, the mathema- 
tical relationship is nonlinear— not linear, as is frequently 
assumed in State aid formulas [7] . 

It is also assumed that population size is related to need, 
and that the population variable is a workable proxy for local 
need. More precisely, it is assumed that a larger population 
implies a greater need. Yet, 80 percent of the 50 cities with 
the highest rates of poverty are small cities with a population 
of 100,000 or less [3] . The issue here is not only that need 
has various facets, but that we must distinguish between the 
intensity of a need and its scale. From the above example 
of poverty, it is clear that the intensity of poverty is greater 
in small cities, but the scale is greater in larger cities. 

Needs have several dimensions, not all of which are 
equally represented by population size [4] . Recognizing this, 
some have suggested that other variables, which are better 
and direct representation of the need being addressed, should 
be substituted for population. This argument has at least 
three practical weaknesses recognized even by advocates 
of alternative data. First, the data for many of these other 
variables are not available as often or in as uniform manner 
as population. Second, many of these variables are subject 
to their own measurement errors. Thus, for example, the 
rates of employment derived from industry or establishment 
data do not reflect self-employment or employment in the 
farm sector. They must be adjusted. Third, some of these 
variables are themselves derivatives of population and, 
consequently, are not immune to the undercount. A case in 
point is the unemployment or employment data used to 
allocate CETA funds. These data are derived from a tortuous 
series of adjustments that lead to estimates for labor market 
areas. A county's share of the labor market's unemployment 

of new entrants is based on its share of the market's popula- 
tion of persons aged 14 to 19 in 1970. The county's share of 
unemployed reentrants is based on its share of the market's 
population 20 years old and over in 1970. (Its share of the 
unemployment of the experienced labor force is based on 
its proportion of recipients of benefits during the reference 
period.) The city's share of the county's unemployed and 
employed is based on its share of these two factors in the 
county in 1970. A census that severely undercounts blacks 
and Hispanics would underestimate unemployment (a key 
factor in the CETA allocation criteria) in those cities with 
a high proportion of blacks and Hispanics. The undercount 
in 1970 was most severe within the black and presumably 
Hispanic labor-force age range. 

Studies have shown that even those formulas that weigh 
population heavily lead to a reasonably efficient distribution 
of Federal funds— to the extent that efficiency means target- 
ing to those areas of greatest need, where need is defined as a 
composite of socioeconomic variables [6] . Admittedly, some 
of these composite indicators of need include some aspect 
of population. Population therefore appears in some form in 
both sides of the equation; e.g., in defining need and in deter- 
mining the distribution of funds. In particular, population 
often appears as a change variable. It is assumed that "de- 
clining" cities have different needs than "growing" cities. 
Population decline is highly correlated with fiscal strain, 
the age of the housing stock, and the conditions of the 
infrastructure [2] . 

Limits of the Population Variable in the 
Distribution Process 

It would appear that, pari passu,, an undercount would 
lead to an erroneous distribution of funds in those instances 
where population is a factor in the distribution formula or 
is used as a major element in determining eligibility. The im- 
pact of the undercount is limited by several factors. First, 
population is rarely the sole determinant of either eligibility 
or the amount of funds a jurisdiction gets based on the use 
of a formula. Yet, it usually has a very strong influence be- 
cause of a large variance, i.e., cities vary more according to 
population and population growth than along other variables. 
Second, the population variable may be highly correlated 
with the other variables in the formula. 

Third, many programs have limits— minima and maxima— 
that in turn limit the impact of population. Fourth, some 
programs have alternative formulas in which population is 
weighted differently, and a jurisdiction has the right to use 
the formula that gives it the most favorable treatment. Fifth, 
population is used in some formulas as an "impaction" or 
relative variable, so that it is not the absolute but the relative 
level (used as a weight) that matters. Sixth, some programs 
provide the authority and funds to the Secretary for making 
adjustments necessary because of errors in the formula; 
presumably, these funds could be used to compensate for 


the undercount. Seventh, the effects of an undercount can 
be offset where a consortium is possible. CETA provides 
for meeting the 100,000-population minimum through a 
consortium. Eighth, often the population variable is only a 
segment of the total population; unfortunately, the target 
population is frequently one that is likely to be seriously 
undercounted. This is the case where low-income population 
is a target. It is also the case where labor-force information 
is important because the undercount rate of black males of 
labor-force age is extremely high. Ninth, alternative methods 
and data used to estimate population in nondecennial years 
differ in the reflection of the undercount. 


In the preceding sections, we established that population 
size (total and by segments) is an important variable in al- 
locating State and Federal Government assistance, and that 
these funds go toward financing vital State and local func- 
tions. In this section, we look at the impact of the under- 
count on specific programs. Because each program uses 
population differently, we can only obtain an accurate assess- 
ment of the impact of population on a recipient by looking 
at each program separately. To fully appreciate the under- 
count and its effects on each of these programs, we must go 
through the tedious exercise of understanding the allocation 
process of each program. 

Sometimes knowing the Federal allocation formula is not 
enough to determine the losers and gainers of an undercount. 
This is the case where Federal assistance to States amounts to 
a pass-through of aid to localities and where States and 
localities are permitted to set up independent distribution 
procedures. Title I of the Elementary and Secondary Educa- 
tion Act of 1965 as amended, for example, distributes funds 
to counties mostly on the number of children 5 to 17 years 
of age who are determined to be from poor families. How the 
counties and States distribute these funds to school districts 
may differ from State to State [8] . In other cases, while aid 
goes to a locality, the real losers are a specific class of resi- 
dents in that locality rather than the jurisdiction in general. 

The number of programs using population is too great 
for us to consider each independently. For this reason, in 
this section we shall concentrate on three programs: General 
Revenue Sharing, the Community Development Block 
Grant, and the Farmers Home Administration Housing 
Program (Section 502). 

General Revenue Sharing 

The State and Local Fiscal Assistance Act of 1972 pro- 
vides for the distribution of $55 billion to State and local 
governments. Some 39,000 units of government use these 
funds for a variety of functions ranging from police to 
education. The main ingredient of this program is that it 

provides a minimum of strings and a maximum of flexibility 
of use by the recipients. 

Allocation formula. Each year, funds are allocated first 
among States using a five-factor or a three-factor formula, 
whichever gives the individual State the greater allotment. 
The five-factor formula is based on population, urbanized 
population, population inversely weighted for per capita 
income, income tax collections, and general tax effort. The 
three-factor formula uses population, general tax effort, and 
relative income. 

Once the distribution among States is accomplished, one- 
third of each State's allocation is awarded to the State gov- 
ernment for its use. The remaining two-thirds is divided by 
the State population size to obtain an average per capita 
grant for that State. As we shall see, this figure provides an 
upper limit to the allocation for each jurisdiction. 

The local government pool for each State is distributed 
among county areas (not necessarily county governments) 
on the basis of relative population, general tax effort, and 
relative income of each area. No county area may obtain 
more than 145 percent or less than 20 percent of the per 
capita figure referred to above. 

Each county area allocation is divided up such that an 
amount goes to Indian tribal governments on the basis of 
their population relative to the population of the total 
county area. From the balance, a county government's allo- 
cation is determined on the basis of the government's ad- 
justed taxes in the county area. No county government is 
allocated more than 50 percent of its adjusted taxes and 
transfers. The remaining portion is allocated among cities 
and townships on the basis of a single formula, using popula- 
tion, general tax effort, and relative income. No unit of local 
government or township may receive more than the 145- 
percent upper limit or less than the 20-percent lower limit 
of the average per capita allotment referred to above. If a 
unit receives less than the 20 percent, its allotment is in- 
creased to either the 20-percent level or 50 percent of its 
adjusted taxes and transfers, whichever is lower. 

Impact of an undercount. Based on the above procedure, 
it is clear that the allocation of general revenue sharing funds 
is tied to an accurate count of total as well as urbanized 
population in each State. For the State, the impact of the 
undercount can be reduced somewhat by moving from one 
formula to the other. Note, however, that this option is 
not available to the localities in the State. 

The major defense builtin for localities is the 20-percent 
or 50-percent lower limit. As long as these lower limits are 
not reached, the only hope of a locality which has been 
undercounted is that some other jurisdiction has surpassed 
its 145-percent limit, thereby triggering a redistribution of 
the excess. On the other hand, a corrected count would not 
help a locality which has been undercounted if that locality 
has already surpassed its upper limit. 


Assuming that a locality has experienced an undercount 
and would not be at its limits, that locality would be a loser 
as a result of the undercount. The State also might be a loser, 
providing that its population count reflects the undercount. 
This, of course, depends upon the method used to reconcile 
differences between the State count and the counts of each 
of its components. In any event, there are more buffers for 
the State than for the locality, although both could con- 
ceivably be losers as a result of the undercount. 

If the locality's undercount is not expressed in the State 
figure, then any adjustment to the locality's figure results 
in an intrastate redistribution. One study has shown that for 
the States of New Jersey and Virginia, such intrastate redis- 
tribution results in no more than a 5-percent change in the 
allotment of all jurisdictions in the State [1 0] . No jurisdiction 
gained more and none lost more than 5 percent. It should 
be noted, however, that this is a one-time loss. To the extent 
that the undercount is reflected in annual allotments through 
estimates using the undercounted population as a base (which 
is true of most methods of population estimates), the under- 
counted entity experiences an annual loss until a true count 
is obtained. 

To the extent that the undercount is reflected in the 
State population total, an adjustment will result in an inter- 
state distribution of funds. The extent to which there are 
gains and losses in an interstate distribution is not bounded 
by limits such as the per capita entitlement, which limits 
intrastate allocations. Yet, at least one study has shown 
that the gains and losses resulting from a readjustment of 
State figures will hardly approach 4 percent [9] - 

In the case of general revenue sharing, the losers and 
gainers are jurisdictions (States and local governments) as 
opposed to identifiable individuals or an identifiable class of 
individuals or jurisdictions. This is so because general revenue 
sharing allotments enter local budgets unearmarked for 
specific beneficiaries. This is not the case in all programs. 

The Community Development Block Grant 

The Community Development Block Grant (CDBG) 
program combines several former categorical programs ad- 
ministered by the Department of Housing and Urban Devel- 
opment. These categorical programs included funds for urban 
renewal, neighborhood development, water and sewer 
systems, and "open space"; public facility loans; neighbor- 
hood facilities grants; and Model Cities supplemental grants. 
Assistance from these programs was obtained on a com- 
petitive basis with no assurance of long-term funding. The 
recipients of these awards were also required to follow 
detailed rules and regulations. Often, local priorities were 
sacrificed in an attempt to obtain assistance and to imple- 
ment these programs. Since its beginning, some $15.5 billion 
have been appropriated to CDBG. 

The general objective of the Community Development 
Block Grant program is to provide for viable communities 

through improved housing and neighborhood conditions and 
through expanded economic opportunities primarily for low- 
and moderate-income persons. Specifically, the 1974 act, as 
amended, aims to: 

1 . Eliminate and prevent slums and blight; 

2. Eliminate conditions detrimental to health, safety, and 
public welfare; 

3. Conserve and expand the Nation's housing stock; 

4. Expand and improve the quantity and quality of 
community services; 

5. Improve the rational use of land and other natural 

6. Reduce the concentration and isolation of income 
groups in specific neighborhoods; 

7. Restore and preserve historic sites; and 

8. Alleviate physical and economic distress. 

As a mechanism for meeting these national objectives, the 
Community Development Block Grant, unlike its prede- 
cessors, is designed to give considerable discretion to local 
governments. These governments may choose among the 
eight national objectives in a manner that respects their local 
conditions and the desire of local citizens. In addition, 
unlike its predecessors, theGDBG program provides fora pre- 
dictable flow of funds to local governments on an annual 

Allocation formula. The allocation of the CDBG funds 
occurs in the following manner: 80 percent of the annual 
appropriation is set aside for metropolitan areas and 20 per- 
cent for nonmetropolitan areas. 

The metropolitan allotment is divided between urban 
counties and entitlement cities. In order to be eligible, an 
urban county must have a population of at least 200,000 
residents (excluding central cities) and have the authority 
to conduct community development activities. A metro- 
politan entitlement city must have a population of at least 

The entitlement to each metropolitan city is based on 
the greater of the amounts derived from one of two formu- 
las. The first formula uses the population of the city relative 
to the population of all metropolitan areas, the extent of 
poverty in the city relative to all metropolitan areas, and the 
extent of housing overcrowding in the city relative to all 
metropolitan areas. In this formula, population and over- 
crowding each get 25 percent of the weight, while poverty 
gets 50 percent. 

The second formula uses the extent of population growth 
lag in the metropolitan city relative to all metropolitan 
areas. Growth lag is the difference between the actual growth 
of an area (using 1960 population and boundaries as a base) 
and the growth it would have experienced had it grown at 
the same rate as similar areas in the Nation as a whole. The 
formula also uses the extent of poverty in the city relative 


to all metropolitan areas and the age of housing in the city 
relative to all metropolitan areas. In this formula, population 
growth is assigned 20 percent of the weight, poverty 30 
percent, and age of housing, 50 percent. 

Similarly, an urban county's entitlement is calculated by 
a dual formula in which the variables given above are ex- 
pressed as the conditions in that urban county relative to 
all metropolitan areas and in which the weights are the same. 
Like the metropolitan city, the urban county chooses the 
formula that gives it the greater entitlement. 

The law also provides an allotment that would protect 
cities that have a smaller entitlement than they historically 
received from the categorical programs that were combined 
into CDBG. Once the entitlements have been determined and 
the "hold-harmless" protection has been arrived at, the 
so-called metropolitan discretionary balance is allocated for 
the direct use of smaller metropolitan cities and counties or 
to the States for use within metropolitan areas. 

The allocations of the discretionary balance are derived 
from a formula that is based on the population of the metro- 
politan areas in a State relative to the population of metro- 
politan areas of all States, the extent of poverty in the 
metropolitan areas of a State relative to poverty in the 
metropolitan areas of all States, and housing overcrowding 
in the metropolitan areas of a State relative to such over- 
crowding in the metropolitan areas of all States. The poverty 
factor gets 50 percent of the weight in this formula, while 
the population and overcrowding each get 25 percent. 

An alternative formula, which may be used in allocating 
this metropolitan discretionary balance, uses the age of 
housing, the extent of poverty, and population in the metro- 
politan area of the State as these factors compare to the 
metropolitan areas in all States. These variables are expressed 
as relatives (as in the first formula), but the weights are dif- 
ferent. Age of housing gets 50 percent of the weight, poverty 
gets 30 percent, and population gets 20 percent. 

The 80 percent set aside for metropolitan areas is distri- 
buted according to the procedures described above. The 20 
percent set aside for nonmetropolitan areas is distributed in 
essentially two steps. The first meets the hold-harmless com- 
mitments to cities in nonmetropolitan areas. Once this has 
been done, the balance (the nonmetropolitan discretionary 
balance) is allocated among States for use in nonmetropolitan 
areas on the basis of one of two formulas, whichever gives 
the greater amount to the nonmetropolitan areas of the 
State. The first formula uses the population of nonmetro- 
politan areas of all States, and the extent of housing over- 
metropolitan areas of a State compared with the nonmetro- 
politan area of all States, and the extent of housing over- 
crowding in the nonmetropolitan area in the State compared 
with the nonmetropolitan areas of all States. In this formula- 
tion, poverty is assigned 50 percent of the weight, and 
population and housing overcrowding are each assigned 25 

The alternative formula uses the age of housing in the 

nonmetropolitan area in the State compared with similar 
areas in all States, the extent of poverty in the nonmetro- 
politan area of the State compared to all States, and the 
population of the nonmetropolitan area of the State com- 
pared to the same areas in all States. In this formulation, 
the age of housing is assigned 50 percent of the weight, 
poverty, 30 percent, and population, 20 percent. 

Impact of an undercount. From the description of the 
allocation procedure, it is obvious that an undercount by any 
single jurisdiction, whether or not it qualified for an auto- 
matic entitlement for CDBG funds, affects not only that 
jurisdiction but all others in the State as a whole. This is 
because of the metropolitan and nonmetropolitan discre- 
tionary balances which are based on the populations of these 
areas in each State. Hence, even if a small nonmetropolitan 
city is not interested in being a recipient of such assistance, 
its population figure is used in determining the overall allo- 
cation to its State for nonmetropolitan areas in that State. 
A similar situation holds for metropolitan cities. 

The use of alternative formulas in which population is 
assigned different weights gives the localities a way of re- 
ducing (but not necessarily eliminating) the loss due to a 
population undercount. Recall that, in the case of general 
revenue sharing, only States had this protection. 

In the first formula, an undercount will simply lead to a 
reduced entitlement— presuming that a city is not denied 
entitlement status because of an undercount that puts it 
below 50,000. But, this is not necessarily the case in the 
second formula. This second formula is particularly impor- 
tant, since it is the one used by most distressed cities— those 
large cities that have slow or negative growth rates and a 
variety of other socioeconomic symptoms of distress. This 
formula, largely because of the population growth-lag 
variable, effectively concentrates substantial CDBG, aid on 
these cities [5] . To understand this, let's work with a 
simple paradigm, which is illustrated in the appendix. Let 
us look at a city that suffers an undercount in the base year 
and ask what that undercount would do to that city's 
entitlement, assuming alternatively a city that has a growing, 
stable, or declining population. We shall also assume that the 
Census Bureau has achieved an accurate count in the current 

An undercount in the base year (1970) and an accurate 
count in the current year (1980) will hurt a growing city 
because it would exaggerate its true rate of growth and could 
lead to a zero value for that jurisdiction as far as the growth- 
lag variable is concerned. The alternative for that city is to 
shift to the first formula, where population size is the factor. 
But this may be no more than a loss minimization procedure 
rather than one that will yield no loss at all. 

For a declining city, an undercount in the base year with 
an accurate count in the current year would hurt if the cur- 
rent count is above the base-year figure. This situation will 
result in the city's registering a growth rather than a decline. 


The loss would be less severe if the city declines below the 
incorrectly enumerated level in the base year, for then it 
would be registered as a declining city with a definite growth 
lag. Even in this case, the city suffers a loss because its de- 
cline is undercounted. 

If the city is stable, under the same assumptions in the 
above paragraphs, it will be hurt. This is so because the city 
will be shown as a growth city. 

Let us turn to the case where the count in the base year 
was accurate, but there is an undercount in the current year. 
If the undercount yields a population size that is above the 
base year, then the city will be aided in the growth-lag 
formula, since its true growth rate will be underestimated. 
Of course, it will be aided even more if the undercount led 
to a figure below its base-year level. In that case, the city 
would be reported as declining. 

If the city is declining, an undercount in the current year 
will work to the advantage of that city because it will ex- 
aggerate its growth lag. To illustrate, a decline means that 
its true current population is below its population in the 
base period, and an undercount of its true current population 
means a figure below its level of decline. 

If the city is stable, an undercount will show this city as 
declining and therefore increase its allocation under the 
growth-lag formula. 

A generalization is therefore derived: An undercount in 
one period followed by a correct count in the next will 
always unfairly reduce the amount a jurisdiction received 
through the CDBG growth-lag formula by overestimating 
its growth. But, a correct count in one period followed by 
an undercount in the next will always unfairly increase the 
amount received by underestimating growth. This holds 
whether the jurisdiction is growing, stable, or declining. 
An undercount in both periods leads to uncertain results. 
(See the appendix.) 

While this certainly implies that a city might be better 
off being undercounted in 1980, the incentives to avoid 
this undercount are great. First, the benefits— even using the 
population lag— are only the marginal differences between 
using the population-lag formula and the population-size 
formula. Second, the population-lag variable, while it might 
lead to a higher allocation, is only one of three variables in 
the equation. All three variables tend to be highly correlated 
with population growth; hence, if a city's growth rate were 
underestimated as a result of the undercount, the other 
variables will dampen the effect, since they also tend to be 
strongly correlated with growth. For example, a growth 
city will have a relatively low rate of poverty and a relatively 
young housing stock, and these tend to reduce allocations. 1 

'The formula contains three variables: Growth lag, poverty, and 
age of housing. If population has grown rapidly (but underestimated 
by the undercount) the city will be helped. But poverty and age of 
housing are inversely correlated with growth and positively correlated 
with allocations; hence, they will tend to offset or "dampen" the 
effect of the undercount that underestimated growth. 

Fourth, an undercount would work to the detriment of the 
city in many other population-based formulas, such as 
revenue sharing and CETA. 

An adjustment for the population undercount of an 
entitlement city or urban county would lead to an interstate 
redistribution of funds. Recall that there is no State allot- 
ment as in the case of general revenue sharing. Thus, the 
adjustment necessary to compensate a jurisdiction that is 
undercounted is spread over many other recipients and 
could appear as a minor reduction in their entitlement. 

Among nonentitlement cities (those funded from the 
discretionary balances), an adjustment would also lead to an 
interstate redistribution, since their figure theoretically 
should be reflected in the size of the metropolitan and 
nonmetropolitan populations used in the formulas. Unlike 
the entitlement case, however, the incentive for urging a 
correction on the part of a nonentitlement city is low. The 
reason for this is that the nonentitlement pool, while de- 
termined on the basis of a formula, is allocated on a com- 
petitive basis. There is no assurance that the undercounted 
jurisdiction would be the beneficiary of an adjustment that 
increased the size of the State allocation. Furthermore, since 
these nonentitlement cities are small (below 50,000 in popu- 
lation) and because the population variable is used as a 
relative rather than an absolute figure, the likelihood that an 
undercount in any one of these cities would markedly reduce 
the State discretionary allotment is very remote. The most 
likely losers and gainers from an incorrect population count 
are therefore entitlement cities, or cities that would have 
been eligible for entitlement if an undercount had not pro- 
duced a population figure of less than 50,000. 

The CDBG program also contains one element that could 
reduce the redistribution of funds, gainers, and losers as a 
result of the population undercount. The Secretary does 
have a discretionary fund that provides for, among other 
uses, adjustments resulting from inequities due to the allo- 
cation formula. 

The Farmers Home Administration Housing Program 

The Farmers Home Administration Housing Program aims 
at improving the quality of the housing stock in small juris- 
dictions. It began through the Federal Housing Act of 1949, 
which authorized home loans to farmers. Since that time, 
the program has expanded to include senior citizens and 
low- and moderate-income residents in towns up to 20,000. 
The program came about in recognition of the serious 
housing conditions and the shortage of mortgage credit in 
nonmetropolitan America. 

For purposes of this discussion, we shall concentrate on 
title 502 housing assistance programs, which made nearly 
$3 billion in loans in 1978. Under this title, the Farmers 
Home Administration is authorized by Congress to make 
loans for the purchase, repair, or building of modest low- 
to moderate-income homes. This program provides insured 


loans to families with an adjusted income up to $15,000 
and guaranteed loans to families with an adjusted income 
up to $20,000. Under the guarantee program, interest rates 
are negotiated between a commercial lender and the bor- 
rower, and 90 percent of the principal and interest are 
guaranteed by HUD. Under the insured-loan program, 
families may obtain an interest subsidy that would bring 
their interest from the commercial lender down to 1 percent; 
the maximum interest on these loans is 10 percent. 

Adjusted income is obtained by taking the total income of 
all adults expected to live in the house; 5 percent of this total 
and $300 for each child expected to reside in the house are 
then subtracted to get the adjusted figure. 

Residents in all nonmetropolitan areas with populations 
less than 10,000 are eligible for this assistance. However, 
residents in areas over 10,000 but not greater than 20,000 
may be eligible if there is a shortage of credit either from 
governmental or private sources as certified by the Secre- 
taries of Housing and Urban Development and of Agriculture. 
In all these areas, loans are restricted to low- and moderate- 
income families as defined by their adjusted income. 

Allocation formula. The funds are allocated by the 
Federal Government to States according to a formula that is 
very sensitive to specific population segments. The formula 
for insured 502 loans uses the State's percentage of the 
national population, State's percentage of national rural 
population living in inadequate housing, State's percentage 
of national rural poverty, and cost of housing adjusted for 
population. Each variable in this formula has a 30-percent 
weight except the cost, which has 10 percent. The State di- 
rectors, in turn, allocate their funds among State districts 
(a composite of State counties), using the same formula. 
The districts then allocate the funds to their constituent 
counties, using the same formula. The counties make awards 
to eligible citizens within their jurisdictions. 

The insured loans are distributed by the States, using a 
formula that is slightly different from that used to distribute 
guaranteed loans. Instead of the poverty population, it 
counts middle-income population, i.e., the State's percentage 
of national rural households with incomes between $15,000 
and $20,000. 

Impact of the undercount. Even though this formula has 
a heavy population content, the undercount of the total 
population has little to do with fund distribution. Distribu- 
tion is based on specific segments of the population and, 
therefore, to suffer a loss, a jurisdiction will have to experi- 
ence an undercount in that segment. Specifically, the seg- 
ments are rural population, rural population living in in- 
adequate housing, rural population below the poverty level 
threshold, rural households 62 years of age and older, and 
rural households with incomes between $15,000 and 
$20,000. An undercount affecting these specific segments 
will hurt, but we know little about the accuracy of the 
counts in these groups. 

It should be noted that while Federal dollars are distri- 
buted to States, their destinations are to specific classes 
of individuals in specific types of counties. For all intents 
and purposes, then, the counties or the towns are the 
jurisdictions at risk, not the State. 

The counties at risk are affected by the State rural totals, 
since these determine the State pool from which the counties 
draw. A higher count for a county does not lead initially to 
a redistribution among counties in its district, assuming that 
the correct count is reflected in the new State total. But 
because any given county is likely to be a very small part of 
the total national rural or even State rural population, ad- 
justing for its undercount is likely to lead to a very small 
redistribution among States. The added dollars could be 
significant, however, in terms of that specific county's 
allocation or needs. 

While the county is the jurisdiction at risk, it is not the 
ultimate loser. Unlike the other programs described in this 
paper, 502 assistance does not go to create neighborhood or 
community-wide goods or services, and it does not become 
part of the general funds available to the local or State juris- 
dictions. The beneficiaries of the 502 housing program are a 
specific class of eligible residents in specific types of counties 
or towns. Therefore, the loser can clearly be identified as a 
household rather than a jurisdiction. 

The loss to such a household in a specific county is 
limited by the credit-carrying capacity of that household, 
since the amount of assistance available to it is a function 
of that factor. The loss falls upon a household that would 
otherwise have been assisted had the county received its due 


The argument is often put forward that a correction of 
one jurisdiction's count will lead to a reallocation of funds, 
such that one jurisdiction can only be made better off at 
the expense of another. This argument is sapped of its 
strength when we consider the following: 

1. At the outset of every fiscal year, calculations are made 
anew. At that point, unless protected by a specific 
hold-harmless clause, the specific allocation to each 
jurisdiction is unknown and undetermined. Thus, there 
is not actually a taking from one recipient and a giving 
to another. 

2. Legislation setting up an assistance program carries the 
express intent of the program, and population is con- 
sciously chosen as a factor or a proxy to represent the 
problem that is being addressed. Any reallocation that 
is based on an accurate population count is therefore 
an improvement over one that is based on an inaccurate 

3. To the extent that the assumption of a diminishing 
marginal utility of a dollar is correct, a reallocation of a 


dollar from a richer to a poorer recipient represents a 
movement toward a general social improvement or 
Pareto optimality. Consequently, a reallocation that 
increases the dollar received by a poorer recipient— the 
general direction of intergovernmental aid— even by 
denying that dollar to richer recipients is an improve- 
ment in social welfare. The rich need a dollar less than 
the poor. 
4. In most cases, the dollar requirements to adjust for the 
undercount of any recipient is spread over a large num- 
ber of other recipients; for each of these other recipi- 
ents, the dollar adjustment is likely to be small and 
lead to a relatively insignificant change in its allocation. 
To the losing jurisdiction or household, the loss could 
be substantial relative to its need. 

Politics, Economics, and Misconceptions 

There are several considerations that reduce the vigor with 
which the case for an adjustment is pursued. They include 
politics, economics, and misconceptions. 

First, those who lose because of an undercount are often 
unaware of their losses. This is so because the undercount 
is an intransitive tax. That is, we do not literally take from 
a recipient. We do not announce how much each should 
have gotten. Therefore, persons and jurisdictions alike might 
not fully internalize the cost to them of an undercount and 
won't pursue adjustments. 

Second, for some, the cost of an adjustment is higher than 
the perceived benefits. For some jurisdictions, to prove and 
pursue legal and administrative actions to correct an under- 
count is not only costly but risky, since there is no assurance 
that their estimates will be accepted or, if they are, that they 
will work consistently or sizably to the jurisdiction's ad- 

Third, sometimes it is not to the political or economic 
advantage of a larger jurisdiction (State or county) of which 
an undercounted jurisdiction is a part to help pursue an 
adjustment. In the case of revenue sharing, an adjusted count 
of a single sub-State jurisdiction count could lead more to an 
intrastate redistribution than an interstate redistribution, 
that is, the possibilities that jurisdictions within a State will 
bind together on adjusting the count of a single jurisdiction 
are diminished. 

Fourth, where the losers are a class of persons or juris- 
dictions, rather than an identifiable entity, the incentive to 
pursue an adjustment is lessened, since no single member of 
that class has any assurance that if an adjustment is made it 
will benefit. This is the case of the CDBG metropolitan or 
nonmetropolitan discretionary funds, where State alloca- 
tions are based on population but intrastate distribution is 
based on competition among applicants. It is also true of 
the Farmers Home Administration Housing 502 program. 

Fifth, the equity question is frequently clouded by as- 
suming that it is rich versus poor. Indeed, this vertical equity 

is often the case. When it is, the welfare argument for a 
readjustment is clear-cut and illustrated by the numerous 
accounts of less needy areas using allocations for activities 
which are not basic and are questionable parts of the overall 
objective of the legislation. This is exemplified by the argu- 
ments that surround many government programs. There is 
also the issue of horizontal equity: To the extent that 
population reflects need, an undercount means that two 
jurisdictions that are fundamentally similar in size (and to 
that extent in need) are treated differently because of the 

Finally, the argument is frequently made that there is an 
equity issue involving correcting only the population variable 
when other variables in a formula also have errors. It appears 
that an incremental step toward equity, i.e., correcting only 
the population data, is superior to standing still. 


This paper discusses population as a key variable in the 
distribution of Federal aid to State and local governments 
and in the distribution of State aid to their localities. How 
the undercount affects this distribution process can only be 
ascertained on a program-by-program basis, since the role 
that population plays varies. 

Similarly, the impact of the undercount on any jurisdic- 
tion requires a detailed analysis of the mix of Federal and 
State assistance that the jurisdiction receives. The story is 
told that the city of Portland, Oreg., having discovered that 
its undercount led to a loss in general revenue sharing funds, 
sought to correct this, only to realize that the higher popu- 
lation figure would have severely reduced its CDBG entitle- 
ment, using the population-lag formula. To have used a 
corrected count would have led to a net loss. 

In this vein, it has been shown that while an undercount 
will reduce the entitlement of a city in most cases, in the 
particular case of the growth-lag variable used in the CDBG 
program, it could yield a higher entitlement than the city 
might otherwise have received. An undercount turns out to 
be potentially beneficial. 

Surely, there is an equity question associated with the 
redistribution that would occur as a result of a correction 
in the count. On balance, it appears to this author that the 
equity argument works in favor of making the adjustment. 
The argument is simple: A distribution based on an error 
which causes us to move away from the original intent 
takes us away from the equity state originally postulated 
as desirable. Any correction, therefore, brings us closer to 
the desired state and is both an improvement in welfare as 
it is in equity. 

In considering the equity question, an error is often made 
in assuming that (owing to an adjustment of the count) funds 
will be taken from one jurisdiction and given to another. 
Actually, few localities have prior claims on a specific dollar 
amount of a program appropriation. They are normally 


informed of their legal claims after the allocation procedure 
is conducted in a central State or Federal bureaucracy. 
Dollars are distributed in the computation process only. 
Further, it is illogical to conclude that one has lost that 
which one should never have received. 

Concerning the dollar amount, the few studies that have 
been done indicate that the dollar gain or loss is generally 
below 5 percent. It might be legitimate to consider that this 
quantity is not worth fighting for or to conclude that it is 
too costly to try to achieve. But these are economic— not 
equity— considerations. The equity consideration would ap- 
propriately relate to the utility (not the amount) of the 
dollars in a needy city compared to the utility in a city 
which is not needy and is obtaining more than was intended. 
Furthermore, a 5-percent loss over several years of a major 
program is by no means peanuts. 

This paper has also tried to identify losers and gainers 
resulting from an undercount and its correction. Sometimes 
these might be persons or jurisdictions. On other occasions 
the losers might be a class of persons or a class of juris- 
dictions. Because these programs have effects that go beyond 
the individual beneficiary, the consequence of an undercount 
is shared by a wide public. 

Hence, while we have clearly identified specific groups 
or entities as losers, it should be noted that since most 
government programs produce externalities— that is, the 
benefits are enjoyed by persons or entities other than the 
recipients— the ultimate losers are frequently more than the 
single person or entity. The Farmers Home Administration 
Program is aimed essentially at a low- or moderate-income 
household, but to the extent that those funds are used to 
substantially improve a house, they benefit the neighborhood 
in which the house exists; to the extent that they help in 
providing credit to a small town, they give liquidity to the 
housing market in that town. Similarly, the CDBG assistance, 
which increasingly benefits low- and moderate-income 
families (accounting for roughly 70 percent of beneficiaries) 
leads to improvements in neighborhoods and therefore to 
people in the neighborhoods and, indirectly, an improve- 
ment in the city. 

Finally, while this paper has concentrated on population 
as a formula or eligibility factor, it is obvious that there is a 
relationship between dollars and votes at all levels of govern- 



Turn to the diagram on the following page. Let line (a) 
represent the correct population point in the base year. An 
undercount means that the population reported by the 
census is below (a), which is represented by point (b). If, in 
the current year, population is correctly enumerated for a 

growing city, this means that the true current population 
size (c) is above (a). The distance from (b) to (c) is greater 
than the distance from (a) to (c), and the difference rep- 
resents the overestimate of growth. 

Graphically, if the city is declining, then the current popu- 
lation (c) is below point (a). If the decline is such that the 
current population is above the point of the undercount (b), 
then it can be represented by (Ci ). If the population decline 
was large and below the point of the undercount, it could be 
represented by point (c 2 ). If the current population level is 
(c! ), then a growth is reported, since (ci ) > (b). But in 
reality a decline was experienced, since (ci)<{a). If the 
current population is (C2 ), then a decline is reported, but 
one that is substantially lower than what actually occurred, 
since the distance from (a) to (c 2 ) > (b) to (c 2 ). 

The true population in both years is equal if the city is 
stable and (a) = (c). Since there was an undercount in the 
base year, represented by (b), a growth is reported for the 
city even though it is stable. The growth is represented by 
the distance from (b) to (c). 

If a city had its population in the base year accurately 
reported, this population level could be represented by point 
(a). If the city has grown, its true current population would 
be some level above (a) and could be represented by point 
(c). An undercount of the current population which is above 
the base year could be represented by point (b! ). In that 
case, the city is helped, since its true growth has been under- 
estimated. That is, the distance from (a) to (c) > (a) to 
(bj ). If, however, the undercount was so severe that it re- 
ported a population figure (b 2 ), the city would be helped 
even more, since the city would be reported as having de- 
clined, (b2) < (a). 

If a city is declining and its population in the base period 
was accurately reported as (a), its current population will be 
some point (c), which is below (a). If, however, the current 
population is undercounted, then, by definition, the under- 
count (b) is below (c) and the city will be helped, since its 
decline would have been exaggerated— that is, a decline from 
(a) to (b)> (a) to (c). 

If the population of the city is stable and the base period 
count is accurate, then (a) = (c). An undercount means that a 
population below (a) will be reported at some level such as 
(b). Hence, the city will be helped because it will be shown 
as having declined even though it has not. Note that (a) = 

The situation of a jurisdiction that has been undercounted 
in both periods is a little more complex. The examples shown 
here indicate that the relative size of the undercount and 
whether the city is growing, declining, or stable are impor- 

Let us take a jurisdiction that was undercounted in both 
periods but grew between periods. Its accurate count can be 
designated by point (a). An undercount in the base period 
can be represented by some point (b). If it grew between 
periods, this growth can be represented by point (c). An 

Diagram Showing Effects of Undercount 

A. Base Year Count Underreported, Current Count Correct 












■(a) = (c) 




B. Base-Year Count Correct, Current Count Underreported 







■(b 2 ) 



(a) = (c) 






II — 

-2 S 4* 3 
























O T3 -O "O T3 








O "D 




3ZIS NOIlVindOd 


undercount in the second period can be represented by 
point (di), (d 2 ), (d 3 ), (d 4 ), or (d 5 ). In the case of (d^ ), 
an undercount which is very slight and close to the true 
current population, the jurisdiction will be hurt, since its 
growth will be exaggerated- the distance from (b) to (di ) > 

(a) to (c). 

If the undercount is represented by (d 2 ), the census will 
register a growth, (b) to (d 2 ). In this case, the jurisdications 
will be hurt if the distance from (b) to (d 2 ) < (a) to (c), 
growth is underestimated and the jurisdiction is helped. If 

(b) to (d 2 ) = (a) to (c), the undercount has no effect. 

If the undercount is represented by (d 3 ), the jurisdiction's 
growth may be underestimated and it is helped if, and only 
if, (b) to (d 3 ) < (a) to (c). If the undercount is represented 
by (d 4 ), it would register the jurisdiction as having received 
zero growth and assist it. If the undercount is (d s ), then 
the jurisdiction is helped because it is registered as having 
declined rather than grown. 

Finally, we can look at a jurisdiction that has declined 
in population between the first and second periods and has 
experienced an undercount in each period. Assume that 
(a) is the correct count in the first period and (b) is the 
undercount in the first period. If the population is declining, 
it could be represented by (C] ), (c 2 ), or (c 3 ). An undercount 
can be represented by (d] ), (d 2 ), (d 3 ), or (d 4 ). If the correct 
population size in the second period is (ci), and the under- 
count in that period is (dj, the jurisdiction will be hurt, 
since the census will register it as growing rather than de- 
clining. If the undercount in the second period is (d 2 ), the 
census will register a decline, i.e., (b) to (d 2 ). The jurisdiction 
is helped if (a) to (ci ) < (b) to (d 2 ). The effect is zero if (a) 
to (ci) = (b) to (d 2 ). It is hurt if (a) to (c! ) > (b) to (d 2 ). It 
is also hurt if the undercount is represented by (d 3 ), be- 
cause this will represent zero growth rather than a decline. 

Suppose, however, that the correct count in the second 
period was represented by (c 2 ) and the undercount in this 
period by (d 4 ). In this case, the jurisdiction is helped if 
(a) to (c 2 ) < (b) to (d 4 ). The undercount will have no effect 
if (a) to (c 2 ) = (a) to (d 4 ). It will be injured if (a) to (c 2 ) > 
(b)to(d 4 ). 

In the case of a stable jurisdiction that has not grown 
between the two periods, its true population in both periods 
will be the same, i.e., (a) = (c). If the undercount is repre- 
sented by (b) and the true population by (a) in the first 
period, then an undercount in the second period could be 
represented either by (di ) or (d 2 ). In the case of (d x ), the 
jurisdiction is hurt because a growth rather than a stable 
population is registered by the census. In the case of (d 2 ), 
a decline is registered and the jurisdiction is helped. 


These results indicate that if a city has an undercount in 
the base year and a correct count in the current year, it will 
invariably be injured by a population growth-lag variable; 

if it had a correct count in the base period and an undercount 
in the current period it will be helped. These results hold 
whether cities are growing, declining, or stable. If there are 
undercounts in both periods, the results are uncertain. A 
city could gain, lose, or experience no impact whatsoever 
regardless of whether the city is growing, declining, or 
stable. In general, a very large undercount would be required 
in the current period if a jurisdiction which was under- 
counted in the first period is to be helped. Such a large 
undercount would certainly hurt in every other program, 
such as CETA and revenue sharing. 


1. Advisory Commission on Intergovernmental Relations. 
Categorical Grants: Their Role and Design, A-52. Wash- 
ington, D.C.: U.S. Government Printing Office, 1978, 
101 and 203. 

2. Bryce, Herrington J. "Characteristics of Growing and 
Declining Cities." Small Cities in Transition: The Dy- 
namics of Growth and Decline, Herrington J. Bryce, 
ed. Cambridge, Mass: Ballinger, 1976,29-47. 

3. Bryce, Herrington J. "Housing in the Future of De- 
clining Small Cities," The Small and Regional Com- 
munity, Edward J. Miller and Robert P. Wolensky, eds. 
Vol. 2. Stevens Point, Wisconsin: Foundation Press, 
1979, 10. 

4. Bunce, Harold, and Goldberg, Robert L. City Need and 
Community Development Funding. U.S. Department 
of Housing and Urban Development. Washington, D.C.: 
U.S. Government Printing Office, 1979, 89-126. Gives a 
discussion of various need measurements. 

5. Bunce, Harold, and Goldberg, Robert L. City Need and 
Community Development Funding. U.S. Department of 
Housing and Urban Development. Washington, D.C.: 
U.S. Government Printing Office, 1979, 82-130. 

6. Cuciti, Peggy L. City Need and the Responsiveness of 
Federal Grants Programs. Prepared by the Congressional 
Budget Office for the Subcommittee on the City of the 
Committee on Banking, Finance, and Urban Affairs, 
U.S. House of Representatives. Washington, D.C., U.S. 
Government Printing Office, 1978. See also John Shan- 
non and John Ross, "Cities: Their Increasing Depend- 
ence on State and Federal Aid." Small Cities in 
Transition: The Dynamics of Growth and Decline. 
Herrington J. Bryce, ed. Cambridge, Mass: Ballinger, 
1977, 189-208. 

7. Hirsch, Werner Z. "Cost Functions of an Urban Govern- 
ment Service: Refuse Collection," Review of Economics 
and Statistics, 41 (1959), 232-241; J. Riew, "Econo- 
mies of Scale in High School Operation," Review of Eco- 
nomics and Statistics, 48 (1966), 280-287; N. Walzer, 
"Economies of Scale and Municipal Police Services: The 
Illinois Experience," Review of Economics and Sta- 
tistics, 54 (1972), 431-438; C. Tiebout, "Economies 


of Scale and Metropolitan Governments," Review of 
Economics and Statistics, 42 (1960), 442-444; and 
O.D.Duncan, "Optimum Size of Cities," Demographic 
Analysis: Selected Readings, J. Spengler and O.D. 
Duncan, eds., Glencoe, III.: The Free Press, 1956, 
632-645; Werner Hirsch, The Economies of State and 
Local Governments, New York: McGraw-Hill, 1970, 
p. 274. 

8. The National Institute of Education. Title I Funds Allo- 
cation: The Current Formula. Washington, D.C.: U.S. 
Government Printing Office, 1977, p. vii. "Detailed 
analysis of data from 24 States indicated that States 
that ignore census data in subcounty allocation alter 
the allocation intended by the Federal formula con- 
siderably. As much as 16 percent of the title I funds in 
some States would shift among districts if the Federal 
statutory formula were substituted for current State 

9. Savage, I.R., and Windham, N. The Importance of Bias 
Removal in Official Use of United States Census Counts. 
Tallahassee, Fla.: Florida State University, 1973. 

10. Strauss, Robert, and Harkins, P. "The Impact of Popu- 
lation Undercounts on General Revenue Sharing Alloca- 

tions in New Jersey and Virginia," National Tax Journal, 
Vol. 27, 617-624. Also, 1970 Census Undercount and 
Revenue Sharing: Effect on Allocations in New Jersey 
and Virginia, Washington, D.C.: Joint Center for Political 
Studies, June 1974. The reader may also wish to see 
Robert B. Hill and Robert Steffes, "Estimating the 1970 
Census Undercount for States and Local Areas," 777e 
Urban League Review, Vol. 1, No. 2 (Fall 1975) 3645; 
Jacob Siegel, Coverage of Population in the 1970 Census 
and Some Implications for Public Programs, Special 
Studies, Series P-23, No. 56, Washington, D.C.: U.S. 
Government Printing Office, 1975; and J. G. Robinson 
and J. S. Siegel, "Illustrative Assessment of Census 
Underenumeration and Income Underreporting on Rev- 
enue Sharing Allocations at the Local Level." The latter 
paper was presented at the annual meeting of the Ameri- 
can Statistical Association, August 13-16, 1979, and will 
appear in Proceedings of the ASA, Social Statistics 
11. U.S. Department of Commerce, Bureau of the Census. 
State Payments to Local Governments, Vol. 3. Wash- 
ington, D.C.: U.S. Government Printing Office, 1979, 
p. 9. 


Wray Smith 

U.S. Department of Health, Education, and Welfare 

While the Slater paper indicated that the effect of the 
undercount can and should be assessed, at least for some 
programs, it perhaps should be of secondary concern. Some 
of the basic decisions to be faced by the Census Bureau will 
have to be made without the results of the studies suggested, 
and those studies are likely not to be at all conclusive. Thus, 
some decisions must be made about adjustment, taking into 
account either known effects or those that can be reasonably 
speculated about, but there will not be definitive informa- 
tion to show that the decision made was exactly the right 

It is important to make a statistical decision on adjust- 
ment soon and as cleanly as possible with what is already 
known and with what collective views as can be used to 
construct a more informed decision. The decision also should 
be based on a concern for the gainers and losers; the designs 
for new programs revolve around cost and coverage— who 
gains and who loses. Program administrators, program plan- 
ners, and the Congress will be inventive enough to modify 
formulas based on the adjustment. 

Although the Bryce paper would have been strengthened 
by the use of some live data, the paper gives a very balanced 
view of an adjustment process— in any process there will be 
gainers and losers— and the determination of that is a very 
complex process. The point of the boundaries on the effects 
of an adjustment, that is, the cushioning effect on any 
adjustment built into a formula with upper and lower 
constraints, is very important. In those formula programs 
that do not have cushioning factors built in, one might wish 
to build in transition arrangements to cushion the impact of 
an adjustment on a major allocation program. Sensitivity to 

the undercount varies between different formula programs 
and must be analyzed in any adjustment process. 

In deciding whether to support an adjustment, com- 
munities are in a risk-averse situation. That is, they would be 
more likely to favor adjustment, if it could be done without 
exposing themselves to the jeopardy of losing because of an 
adjustment. All things considered, however, there definitely 
should be an adjustment, except in those cases where the 
cost of a very small adjustment is too great. 

The effect of the compounding over time of an error on 
the allocated amount also is an important consideration. Mr. 
Bryce said that a reality in adjustment is that what might 
appear to be a small cost to a community of an undercount, 
would become significant over time. In an entitlement 
process, what is distributed is really a notification of an 
entitlement, not the actual money, and the community may 
have to make application to get the money it is entitled to. 
The community may not realize what it gains or loses until it 
is explained. 

There are three distinguishable types of losers and gainers. 
The first is in the case of an identifiable individual person 
who loses or gains, for example, in the Farmers Home 
Administration 502 program. In the general revenue shar- 
ing program, the losers or gainers are identifiable juris- 
dictions. In the Community Development Block Grant 
Program, which has a discretionary fund, there are no 
identifiable losers or gainers because of the way the program 
is set up. In talking about losers, it may not be possible to 
identify individuals, and the applicant could not know 
whether it would actually win if its figures were adjusted for 



It was observed that although most discussion focused on 
the general revenue sharing program and the discussion was 
in terms of a zero sum, other programs are not so 
competitive because they are not zero sum. In fact, it was 
suggested that the fixed pie in revenue sharing should not 
necessarily be fixed. If a large number of persons were added 
to the national total, perhaps the total amount of money 
allocated should be adjusted. It is also not inconceivable that 
the formula would be changed after an adjustment was made. 

Discussion followed concerning what action was likely if 
Congress attempted to take into account the impact of the 
undercount. The question of a transitional fund arose, as well 
as the question of holding communities harmless. Con- 
gressional debate might consider an adjustment, as long as 
those communities that would otherwise have received a 
certain number of dollars would be held harmless. This was 
felt to be a likely outcome, since it reduces the amount of 
acrimony and would in the long run reduce the amount of 
adjustment made. The problem is that the debate is not only 
about funds, but also about congressional districts and power. 

It was agreed that it is virtually impossible for a 
jurisdiction to compute the differential impact of an adjust- 
ment in the many formula programs. The formulas are far 
too complex and there are insufficient data available to make 

an estimate of impact. Under the Community Block Grant 
Program, for example, the city is one of the communities 
benefited by the growth-lag formula. At the same time, 
however, a number of new standard metropolitan statistical 
areas had to share in the same total amount, and the real 
impact of the change is unknown. Similarly, if all of the 
elements of a formula were simultaneously updated, there 
would be no means to compute the effect of each factor. It 
was concluded that the elements in formulas and the 
regulations concerning them are far too complicated; they 
should be simplified and made more uniform in their impact 
on cities. It was noted, however, that the Office of Federal 
Statistical Policy and Standards has added a staff member 
whose only responsibility will be to examine Federal 
allocation formulas. This person will work with the Congress 
and in coordinating agency work on formulas. 

Finally, concern was expressed as to the possible 
compounding effect of adjustments. For example, if there 
were a 5-percent error that was not constrained by the 
boundaries, it would double within 14 years, and within 20 
years, increase to 165 percent. But that assumes a linear 
relationship (with no feedback from lost funds) on attri- 
butes of the population prone to undercount, which might 
further increase the error. 




The Synthetic Method: Its Feasibi I ityfor Deriving 

the Census Undercountfor States 

and Loca I Areas «—« * ™ 

National Urban League 


The issue of an undercount has plagued every decennial 
census since its inception. In fact, for the first census, the 
Government mandated that minorities be undercounted! In 
that census of 1 790, only three slaves were to be counted for 
every five whites for purposes of political apportionment. 
Thus, from the outset, 40 percent of the black population 
was left out of the census. 1 Since then, blacks have 
continued to be disproportionately undercounted in decen- 
nial censuses— although not always as a result of overt 
governmental mandate. 

In the census of 1870, for example, the Census Bureau 
estimated that blacks accounted for 41 percent (or 512,163) 
of the total 1,260,078 persons not counted. Or, while 1.9 
percent of the white population was missed in 1870, 9.5 
percent of the black population was left out [15]. Appar- 
ently, some of the highest undercount rates for blacks 
occurred in the censuses conducted between 1920 and 1940. 
While 12.5 percent and 12.7 percent of blacks were not 
counted in the 1930 and 1940 censuses, respectively, 15.2 
percent of all blacks were estimated to have been missed in 
the census of 1920. But by 1960 and 1970, the Census 
Bureau estimated that the black undercount rate fell to 8.0 
percent and 7.7 percent, respectively [15] . 

Although the magnitude of the undercount for other 
minorities, such as Hispanics and Native Americans, has not 
yet been systematically specified, the evidence available 
strongly suggests that they have been disproportionately 
undercounted as well. But the issue of the census undercount 
is not merely of academic or technical importance; it has 
serious social, economic, political, and legal implications as 
well [1]. 

First of all, a disproportionate undercount of subgroups 
severely flaws the accuracy and adequacy of social planning 
efforts, especially needs assessments, population projections, 
and the distribution of social and economic programs and 
services to those groups [4] . 

Secondly, since population figures are used, in part, by 

'While 40 percent of black slaves were omitted from the 
population counts used for allocating seats for each State in the 
House of Representatives, all of them were included in the total 
population counts in the censuses conducted between 1 790 and 1 860. 
Thus, two census counts were maintained: A total population count 
and an "adjusted" count for purposes of political apportionment. See 
Hyman Alterman, Counting People: The Census in History, New 
York: Harcourt, Brace, and World, 1969, especially chapter 7, "The 
Negro in the American Census," pp. 262-290. 

over 100 Federal programs to allocate billions of dollars each 
year in such areas as education (title 1, free lunches), 
employment (Comprehensive Employment and Training 
Act), housing (Community Development Block Grants), 
economic development (Economic Development Admini- 
stration), social services (title XX, child abuse, elderly), crime 
(Law Enforcement Assistance Administration), health, trans- 
portation, and revenue sharing, States and localities with 
disproportionate undercounts are deprived of their equitable 
share of these Federal grants-in-aid [17] . 

Third, since the census count is used as the basis for 
allocating representation, not only in the House of Repre- 
sentatives, but in State and local policymaking bodies as well, 
States and local areas with disproportionate undercounts are 
also deprived of their equitable share of political represen- 
tation at all levels of government [16] . 

Fourth, the deprivation of an equitable share of political 
representation and financial aid to States and localities, as 
well as to subgroups, with disproportionate undercounts 
raises significant constitutional and legal questions as well. In 
fact, one of the resolutions adopted at a 1967 conference on 
the census undercount (sponsored by the M.I.T.-Harvard 
University Joint Center for Urban Studies) effectively sets 
forth the constitutional issue: 

We believe that what, initially at least, were technical 
problems have by their very magnitude been transformed 
into social problems with powerful legal and ethical 
implications. Specifically, we hold that where a group 
defined by racial or ethnic terms and concentrated in 
specific, political jurisdictions, is significantly under- 
counted in relation to other groups, then individual 
members of that group are thereby deprived of the 
constitutional right to equal representation in the House 
of Representatives and, by inference, in other legislative 
bodies. Further, we hold that individual members of such 
a group are thereby deprived of their right to equal 
protection of the laws as provided by Section 1 of the 
14th Amendment to the Constitution in that they are 
deprived of their entitlement to partake in Federal and 
other programs designed for areas and populations with 
their characteristics. 

Injury, while general, is real; redress is in order. This 
would seem a matter of special concern to the Nation in 
view of recent Supreme Court rulings establishing the 
"one-man-one-vote" principle in apportioning legislatures 
and in view of the extensive Congressional activity in the 
establishment of programs designed to improve the 
economic and social status of just those groups that 
appear to be substantially underrepresented in our current 
population statistics [1] . 



Fifth, and finally, a disproportionate undercount of 
minorities hurts nonminorities as well. Since areas with 
disproportionate undercounts of minority groups are 
deprived of their equitable share of Federal financial assist- 
ance, those States and localities often have to place greater 
tax burdens on all residents. 

Thus, the census undercount has significant social, 
economic, political, and legal ramifications. And the 
inequities to local areas as well as to population groups with 
disproportionate undercounts require immediate redress. But 
what actions can be taken to reduce the inequities to areas 
and groups as a result of the census undercount? 

There is widespread agreement that some adjustment of 
the population figures for States and local areas to correct 
for the census undercount is desirable. But there is little 
consensus regarding such related issues as: 

(a) What methods can be used to correct for the census 
undercount for States and local areas— the synthetic, 
demographic, or matching method? 

(b) Which method is most feasible and reliable for 
adjusting for the undercount for localities? 

(c) Should adjusted population figures be used for 
purposes of political apportionment as well as for 
financial allocations to States and localities? 

This paper will attempt to address these questions by 
assessing the comparative strengths and weaknesses of the 
synthetic method for adjusting the census undercount for 
States and local areas. 

The second section of this paper will briefly describe the 
synthetic method and its basic assumptions, while the third 
section will provide an overview of research studies that have 
used the synthetic method. In the fourth section, the 
comparative advantages and disadvantages of the synthetic 
method will be assessed according to various criteria: Internal 
consistency, simplicity, timeliness, flexibility, equity, and 
reliability. The concluding section will propose specific 
recommendations for using the synthetic method to adjust 
for the census undercount for States and local areas. 


The synthetic method is a statistical procedure for 
distributing the undercount of a larger geographical area 
(such as the Nation, State, county, or city) among its 
subunits (such as a State, county, city, or congressional 
district, respectively). For example, this method permits one 
to distribute the total 5.3 million persons that the Census 
Bureau estimated had been left out of the 1970 census not 
only among all 50 States, but also among every subdivision 
(such as standard metropolitan statistical areas, counties, 
cities, towns, congressional districts, wards, neighborhoods, 
census tracts, and planning areas) within each State. Similarly 
the synthetic method allows one to distribute the 1.9 million 

blacks and 3.4 million whites that were not included in the 
official 1970 census count throughout all geographical 
subdivisions below the national level. 

The Null Hypothesis 

The synthetic method requires only one basic assumption: 
the null hypothesis (i.e., the assumption of "no" difference). 
The null hypothesis is a time-honored and widely-accepted 
practice in statistics which assumes that the difference 
between means estimated for samples (or subunits) are not 
statistically different from the mean estimated for the 
universe (or the total population). With regard to the 
undercount, the null hypothesis assumes that the estimates 
of the undercount for specific race/sex/age groups in various 
subunits below the national level (such as States, counties, 
and cities) are not statistically different from the undercount 
estimates for the same race/sex/age groups at the national 
level. 2 

For example, the Census Bureau estimated that the 
national undercount rates for black and white males 35 to 39 
years old were 17.8 percent and 4.1 percent, respectively. 
Based on the null hypothesis, the synthetic method assumes 
that the undercount rates for black and white males 35 to 39 
years old at units below the national level will not be 
statistically different from their undercount rates at the 
national level. More specifically, the synthetic method 
assumes that white males ages 35 to 39 will be undercounted 
at a rate of 4.1 percent in all subnational units, while black 
males ages 35 to 39 will be undercounted at a rate of 17.8 
percent in all subnational localities (see table 1 ). 

It is important to note that the null hypothesis is most 
often used by statisticians in situations where one does not 
have a reliable basis for inferring the magnitude or direction 
of differences between the means of samples or subunits. In 
this instance, we have no basis for knowing whether, for 
example, black males ages 35 to 39 are undercounted in 
Birmingham, Ala., or Detroit, Mich., at a rate higher or lower 
than the 17.8 percent rate for black men in this age category 
at the national level. We are constrained, therefore, to accept 
the null hypothesis: The undercount rates for black males 
ages 35 to 39 in Birmingham and in Detroit are not 
statistically different from the undercount rate for black men 
in that age group at the national level. 

Another way to make clear the basic assumptions of the 
synthetic method is to be explicit about what this method 
does not assume: 

(a) First, the synthetic method does not assume that all 

2 See description of null hypothesis in any standard text on 
sampling statistics. However, since a null hypothesis traditionally 
refers to hypotheses that can be independently tested. Professor 
Harry Hoberts, University of Chicago, suggests that it would be more 
appropriate to refer to the assumptions underlying our synthetic 
method as a maintained "hypothesis." 


Table 1. Estimates of Net Undercount Rates, by Race, 
Sex, and Age: 1970 

Table 2. Estimates of Percentage Increase Necessary to 
Adjust for Undercount, by Race, Sex, and Age: 1970 







: emales 


Black 1 


Black 1 



Black 1 


Black 1 

Under 5 













Under 5 




















































75 and 






1 These 

percentage increases 

are for 

blacks and other 


75 and 








Derived by the 


Urban League 


Note: These rates refer to the adjusted percent undercounts in 
Set D presented in tables 4 and 5 of Jacob Siegel's paper, cited 

1 These rates are for blacks and other nonwhite races. 

Source: Prepared by the National Urban League Research 
Department from data in Jacob Siegel, "Estimates of Coverage of 
the Total Population by Sex, Race, and Age in the 1970 Census," 
U.S. Census Bureau, 1973; and 1970 General Population 

blacks (or all whites, for that matter) are undercounted 
to the same extent in every State, county or city. On the 
contrary, since the national undercount rates vary among 
blacks by sex and age, the synthetic method also assumes 
that the total undercount rates for blacks in different 
States and localities will vary, depending upon the sex 
and age distributions of the black populations in those 
localities. This is why the synthetic method yields 
different undercount rates for blacks in different States 
(e.g., Vermont, 7.9 percent; Maine, 7.6 percent; New 
York, 7.2 percent; South Carolina, 6.6 percent; Missis- 
sippi, 6.3 percent) and cities (e.g., New York, 7.2 
percent; Philadelphia, 6.9 percent; New Orleans, 6.7 
percent; Charleston, 6.5 percent), 
(b) Second, the synthetic method does not assume that the 
undercount rates for persons in the same race/sex/age 
categories are the same in absolute numbers in different 
subnational units. On the contrary, this method assumes 
that the undercount rates for persons in the same 
race/sex/age groups are different in different subnational 
localities, but that those differences are not statistically 
significant. In other words, for example, we assume that 
the undercount rate for black males ages 35 to 39 in 

Department from Census Bureau undercount rates in Jacob Siegel. 
"Estimates of Coverage of the Population by Sex, Race, and Age in 
the 1970 Census," U.S. Census Bureau, 1973; and 7570 General 
Population Characteristics. Reprinted from Robert B. Hill and 
Robert B. Steffes, "Estimating the 1970 Census Undercount for 
State and Local Areas," NUL Research Department, Washington, 
D.C. July 23, 1973. 

Birmingham will differ in absolute numbers from the 
undercount rate for black males 35 to 39 in Detroit, but 
that neither undercount rate will be statistically different 
from the 17.8 percent undercount rate for black males in 
that age category at the national level. 

Deriving Local Undercounts 

The synthetic method can easily be used by nonsta- 
tisticians to derive the census undercount for any areas below 
the national level. One only needs to apply the appropriate 
percentage increase to the official 1970 census count for 
specific race/sex/age groups in specific localities. For 
example, an undercount rate of 14.4 percent for black males 
ages 30 to 34 requires that the official (or published) census 
count of black men 30 to 34 years old in a locality be 
increased (or inflated) by 16.8 percent (or, more appropri- 
ately, by 116.8 percent). Similarly, an undercount rate of 9.7 
percent for black females under the age of 5 requires that the 
official census count of black females under 5 be increased 
by 10.7 percent (or by 110.7 percent). This step is repeated 
for each age category in each of the four key subgroups: 
black males, black females, white males, and white females. 
Using the percentage increases listed in table 2, one can 
derive and adjust for the census undercount for any State or 
local area. 


Table 3. Estimated Corrected Count of U.S. Population by State, for All Races: 1970 

Percent Published Corrected Amount Percent 

State undercount count count undercount distribution 


New York 

New Jersey 
North Carolina 
South Carolina 
of Columbia 

West Virginia 
New Mexico 

Rhode Island 
New Hampshire 
South Dakota 

North Dakota 

Source: Prepared by the National Urban League Research Department from data in Jacob Siegel, "Estimates of Coverage of 
the Population by Sex, Race, and Age in the 1970 Census," U.S. Census Bureau, 1973; and 1970 General Population Charac- 
teristics for each State. 























1 1 ,496,050 





1 2,078,090 









































1 27,403 




5,81 2,908 

1 23,749 










1 1 8,489 
































































1 ,977,642 







































1 ,483,493 



























1 5,337 





1 5,333 











































Table 4. Estimated Corrected Count of U.S. Population by State, for Blacks and Other Races: 1970 



New York 
North Carolina 
New Jersey 
South Carolina 
of Columbia 
New Mexico 

West Virginia 

South Dakota 
Rhode Island 

North Dakota 

New Hampshire 

Percent Published Corrected Amount Percent 

undercount count count undercount distribution 

























































1 ,005,020 

1 ,079,694 














91 0,334 















727,51 1 




















































1 5,41 1 




1 70,494 





1 78,301 

1 2,349 




1 70,962 





























63,61 5 
















54,41 5 



































1 9,683 




























Source: Prepared by the National Urban League Research Department from data in Jacob Siegel, "Estimates of Coverage of 
the Population by Sex, Race, and Age in the 1970 Census," U.S. Census Bureau, 1973; and 1970 General Population 
Characteristics for each State. 



Since 1973, the synthetic method has been used by 
several nongovernmental researchers and organizations to 
correct for the census undercount. However, at the time that 
the Census Bureau released its analysis of the undercount in 
the 1970 census by race, sex, and age at the national level on 
April 25, 1973, the Bureau indicated that it had not 
developed a method for distributing the national undercount 
among States and local areas [7] . 

Consequently, the National Urban League's (NUL) Re- 
search Department decided to investigate the feasibility of 
developing estimates of the census undercount for areas below 
the national level. On July 23, 1973, the NUL Research 
Department released its report, "Estimating the 1970 Census 
Undercount for States and Local Areas," at a press con- 
ference held during the National Urban League's Annual 
Conference that summer. That study distributed the national 
undercount of 5.3 million persons among all 50 States as well 
as among 36 selected cities. Moreover, it provided crude 
estimates of the amount of general revenue sharing funds lost 
by 50 States and 20 cities as a result of the census 
undercount [2] . 

Another study using the synthetic method to adjust for 
the undercount was released in July 1973 by I.R. Savage and 
B.M. Windham of Florida State University. However, that 
report derived estimates of the census undercount only for 
the State of Florida. Moreover, it used only the national 
undercount rates for blacks and whites by race to adjust for 
the undercount. Therefore, variations in the undercount 
among blacks (and whites) due to different sex and age 
distributions in different Florida localities were not systemat- 
ically incorporated by this crude version of the synthetic 
method. Thus, this study, in effect, made the clearly 
untenable assumption that the undercount rate was the same 
for all blacks (and for all whites) in the State of Florida 
regardless of their sex and age differences [6] . 

Most of the subsequent research studies using the syn- 
thetic method focused on the impact of the undercount on 
Federal grants-in-aid to States and local areas. Strauss and 
Harkins prepared an analysis of the impact of the census 
undercount on the allocation of general revenue sharing 
(GRS) funds to the States of New Jersey and Virginia for the 
Joint Center for Political Studies. This study not only 
corrected for the population component in the general 
revenue sharing formula by using the national undercount 
rates for specific race/sex/age groups, but also updated per 
capita income for localities by race and sex [10] . 

A comprehensive evaluation of the impact of the census 
undercount on GRS allocations to subnational jurisdictions 
throughout the country was conducted by the Stanford 
Research Institute (SRI) under contract to the Office of 
Revenue Sharing of the U.S. Treasury Department. But SRI's 
mandate was much broader than solely assessing the signifi- 
cance of adjusting for the population undercount in allo- 

cating GRS funds. Its primary objective was to determine the 
inequities that might result from deficiencies in any of the 
major data elements in the GRS allocation formulas, i.e., 
total population, per capita income, adjusted local taxes, 
urbanized population, personal income. State and local taxes, 
State individual income tax. Federal individual income tax 
liabilities, and intergovernmental transfers [9] . 

Some of the major defects in the GRS data elements 
identified by SRI were lack of timeliness, lack of compre- 
hensiveness, and inaccuracies. The uneven currency of the 
data was reflected in the fact that while updated figures for 
general tax efforts were available for all 39,000 jurisdictions, 
the per capita income figures used in the allocation formula 
for States and localities were based on 1969 total money 
income from the 1970 census. Similarly, while updated 
annual estimates of total population for States were used to 
allocate GRS funds, the total population figures used for all 
sub- State jurisdictions were from the 1970 census. Further- 
more, the omission of 5.3 million uncounted persons from 
the updated State figures and from the outdated sub-State 
figures compounded the incompleteness and inaccuracy of 
the population figures used in the GRS formula. 

Although the Stanford Research Institute's GRS data 
study of 1974 revealed that population was not as significant 
a factor as income or general tax effort in the GRS formula, 
it nevertheless concluded that the equity of allocations to the 
50 States and the District of Columbia would be increased if 
the synthetic method were used to adjust for the population 
undercount at the State level. Unfortunately, SRI's recom- 
mendation to improve the accuracy of the population data in 
the GRS formula by adjusting for the census undercount has 
not yet been implemented by the Office of Revenue Sharing. 

At a meeting in May 1974, a black leadership group 
(which was the forerunner of the Census Advisory Com- 
mittee on the Black Population for the 1980 Census) 
requested that the Census Bureau prepare its own assessment 
of the impact of the census undercount on the distribution 
of Federal funds and political representation to States and 
local areas. That study, formally issued in August 1975, used 
several versions of the synthetic method based on different 
assumptions to derive various estimates of the census 
undercount for States. It concluded that, in general, rela- 
tively small shifts in the distribution of GRS funds and of 
political representation at the State level would occur if 
improvements were made in the quality of population and 
other data elements in allocation formulas. Moreover, the 
Bureau cautioned that, while it used the synthetic method to 
derive illustrative estimates for purposes of its analysis, this 
study should not be construed as an endorsement (or as a 
repudiation) of the synthetic method [11]. 

On the other hand, the National Commission on Employ- 
ment and Unemployment Statistics, in its final report issued 
in September 1979, strongly recommended that the syn- 
thetic method be used to adjust for the undercount in 
governmental labor force statistics. It argued that an adjust- 


merit for the undercount would in fact be smaller in 
magnitude than the adjustments the Census Bureau tradi- 
tionally make to account for underreports (of income and 
unemployment, for example) in the Current Population 
Survey (CPS): 

Because the uncounted population is not directly 
measured, the undercount adjustment of labor force 
statistics would require an assumption that, within each 
demographic group, the labor force status of persons 
missed in the census resembles the status of those 
counted. Granted that this comparability assumption is 
questionable, it still may be less objectionable than 
assuming that unenumerated persons do not exist. Fur- 
thermore, comparability assumptions are already made in 
present noninterview adjustments and second-stage ratio 
estimates. The percentage rate of underreporting in the 
CPS sample, relative to controls derived from the census 
counts, is actually larger in magnitude than the census 
undercount relative to the independently derived esti- 
mates of the true population for most cells in table 8-5. 
This CPS underrepresentation has always been offset by 
assuming that the unreported persons have the same 
characteristics as enumerated persons in the same age-sex- 
race groups. Adjusting for the census undercount would 
merely be an extension of present practice— and a lesser 
magnitude— to bring the CPS estimates in line with the 
estimated true population [3] . 

In fact, the Commission's recommendation that the labor 
force statistics be adjusted for the undercount was a 
reaffirmation of the position of its predecessor, the Presi- 
dent's Committee to Appraise Employment and Unemploy- 
ment Statistics, which was appointed by President John F. 
Kennedy in 1961. In its report released in 1962, this 
committee, more popularly known as the Gordon Com- 
mittee, recommended "that in .-the preparation of the 
household survey estimates, an adjustment be introduced to 
take account of underrepresentation of the population in the 
decennial census." However, this recommendation was not 
implemented by the Census Bureau [3] . 

The Panel on Decennial Census Plans established in 1978 
by the National Research Council of the National Academy 
of Science also concluded that (a) inequities in the 
allocation of Federal funds could be reduced by adjusting for 
the census undercount for States and local areas, and (b) that 
feasible methods for making such adjustments already 
existed. Yet, the Panel stopped short of recommending (a) 
that a specific method be used to make such adjustments and 
(b) that the Secretary of Commerce direct the Census Bureau 
to make these adjustments in the 1980 census population 
figures for States and local areas [5] . 

Thus, this overview of past uses of the synthetic method 
reveals that there is widespread consensus among nongovern- 
mental researchers that adjustments for the census under- 
count would reduce inequities to State and local areas and 
that the synthetic method is one feasible method for making 
such adjustments. But what are some advantages and disad- 
vantages in using the synthetic method for such purposes? 


In order to properly assess the comparative advantages 
and disadvantages of using the synthetic method, it is 
necessary to briefly describe the basic elements of other 
methods that have been used in the past to derive estimates 
of the census undercount— most especially, the matching and 
demographic methods. 

Matching Method 

The matching method was the principal means of deriving 
estimates of the census undercount in the 1950 census. It 
involves matching the results from at least two independent 
sources. This has often meant comparing the characteristics 
of persons in the census with (a) the characteristics of 
persons obtained in the monthly Current Population Survey 
(CPS) of about 50,000 households or (b) with the charac- 
teristics of persons in an independent list or register. 
However, since lists by their very nature are incomplete and 
relate to only segments of the population (such as heads of 
households, children who are students, medicare enrollees, 
and automobile owners), more than one list would often be 
needed in order to properly match up with a census 
household roster. Obviously, a major difficultly in addition 
to finding a more complete listing is to obtain a list that is 
similar in time to the census. Estimates of the undercount 
were derived for 1970 by matching the census to the CPS. 
But males did not show a higher undercount rate than 
females according to the matching estimates, although this 
relationship has been consistently documented, based on 
demographic analyses. This discrepancy is most likely due to 
correlation bias: persons who are missed in the census are 
also more likely to be missed in the CPS [5] . 

Demographic Method 

The method of demographic analysis has been the basic 
procedure for deriving national estimates of the undercount 
for censuses in 1950, 1960, and 1970. It involves comparing 
the actual number of persons enumerated in the census with 
the number of persons "expected" to be counted. The 
"expected" or "true" population is derived independently of 
the census and refers to "the number born minus the number 
who have died, adjusted to account for the number who have 
moved into or out of the country." Thus, a variety of sources 
are used: Birth records, death certificates, Medicare records, 
and immigration and emigration statistics [5] . 

Clearly, the accuracy of estimates derived by the demo- 
graphic method is directly related to the quality of data from 
records. The existence of large numbers of undercounted and 
uncounted illegal aliens underscores the unreliability of 
immigration statistics. But statistics on emigrants leaving the 
United States, especially those that have been collected by 


the Immigration and Naturalization Service, are similarly of 
questionable reliability. 

In addition, indepth analyses by the Census Bureau have 
revealed that State-of-birth records for many persons, 
especially blacks, inappropriately refer to one's current State 
of residence rather than to one's actual State of birth. Thus, 
State-of-birth data are often unreliable because of under- 
reporting and misreporting. 

Furthermore, since ethnic origin is rarely recorded on the 
birth and death records for most States (not to mention 
internal migration information), it has not been possible to 
derive national estimates of the undercount for such groups 
as Hispanics, using the demographic method. 

Consequently, because of the incomplete and uneven 
quality of birth, death, and most migration data for various 
racial, ethnic, and age groups in different States, a number of 
questionable assumptions must be made to derive national 
undercount estimates by the demographic method. 


A procedure that is not commonly perceived as a method 
for deriving estimates of the undercount (but which in effect 
does just that) is that of imputation. Imputation is a method 
by which the Census Bureau imputes the existence of persons 
not contacted by the census or allocates characteristics to 
persons who were enumerated in the census but for whom 
certain traits (such as race, sex, and age) are missing from the 
census forms [5] . 

Many opponents of recommendations to adjust for the 
census undercount usually assume that the official census 
counts consist solely of persons who were actually contacted 
or interviewed and contain no adjustments for persons 
missed or not contacted. For decades, the Census Bureau has 
imputed the existence of persons or their characteristics 
because of inadequacies in the enumeration procedure or in 
the computerized processing of data. And the Bureau has 
regularly published these imputed figures for all concerned 
parties to examine [14] . 

In the 1970 census, for example, 4.9 million persons (or 
2.4 percent of 203 million persons) were imputed— 2.7 
million because of enumeration deficiencies and 2.2 million 
because of processing failures. Some of the enumeration 
inadequacies include (a) "closeout" cases, i.e., households 
that were visited several times, but no one was found to be at 
home; (b) households that refused to cooperate; (c) persons 
whose existence was determined after the census enumera- 
tion by a post office check; and (d) housing units that were 
recorded as vacant, but which may have been occupied. 
Examples of processing failures include instances in which 
questionnaires may not have been properly microfilmed or 
read by the automatic data processing system (i.e., 
FOSDIC— Film Optical Sensing Device for Input to 

Characteristics are imputed or allocated to unenumerated 

persons and households by randomly assigning charac- 
teristics from other persons and households in their 
neighborhood. When a deficient record emerges, charac- 
teristics from the most recently processed record are com- 
pletely or partially duplicated and assigned to the record 
with the processing information. 

Obviously, some of these imputation methods are ques- 
tionable. For example, uncontacted blacks living in predomi- 
nately white areas and uncontacted whites living in 
predominantly black areas would most likely be assigned 
characteristics from households of the opposite race, based 
on these allocation procedures. Moreover, while national 
vacancy surveys may reveal that every nth housing unit that 
is designated as vacant is in fact occupied, the random 
assigning of people and characteristics of households to every 
nth vacant unit is highly tenuous. 

In short, the official census counts used for congressional 
apportionment and governmental grants-in-aid traditionally 
contain imputed or adjusted numbers for millions of persons 
who were never contacted or interviewed by census enumera- 
tors. Thus, the Census Bureau's published national estimates 
of the undercount do not reflect, as is popularly assumed, 
the universe of persons not contacted by the census. In other 
words, the "actual" number of persons not contacted in the 
1970 census was 10.2 million. However, since 4.9 million of 
them were imputed by allocation procedures, this left the 
residual of 5.3 million still uncounted. Consequently, adding 
the 5.3 million uncounted persons to the official count 
would merely be an extension of current imputation and 
adjustment procedures. 

Comparative Criteria 

We will now review the comparative advantages and 
disadvantages of the synthetic method according to the 
following criteria: Internal consistency, simplicity, time- 
liness, flexibility, equity, and reliability. 

1. Internal Consistency. Ideally, any method for deriving 
subnational estimates of the undercount should yield esti- 
mates that are internally consistent with those at the national 
level or at the next larger geographical context. Internal 
consistency of undercount figures would greatly facilitate the 
proper apportioning of governmental funds or political 
representation among all geographical subunits. For example, 
undercount figures for an aggregate of cities should equal 
those for their respective counties, which, in turn, should 
equal the figures for their respective SMSA's, which, in turn, 
should equal the figures for their respective States, etc. 

While the demographic method appears to be the most 
reliable procedure for deriving undercount estimates by 
race/sex/age at the national level, it is not the most feasible 
for yielding internally consistent estimates below the 
national level. In fact, the Census Bureau's own indepth 
research in this area reveals that the demographic method 
would produce undercount estimates for States that would 


be widely inconsistent with national estimates because of the 
uneven quality of birth and death records and the lack of 
adequate internal migration data between States. Highly 
questionable adjustments would be needed to make those 
States' figures congruent with nationwide figures [13] . 

The matching method, on the other hand, would yield 
even less reliable and consistent undercount estimates for 
States than the demographic method. Because of the low 
probability of securing adequate and timely lists or rosters of 
household composition, the matching method appears to be 
one of the least feasible for producing subnational under- 
count estimates that would be internally consistent with 
national estimates. Adjustments of such an arbitrary nature 
would be required to make the widely varying undercount 
estimates for States derived by the matching method 
consistent with national estimates to be meaningless and 

The synthetic method, by definition, yields undercount 
estimates for subnational units that are internally consistent 
with national estimates. Thus, no further adjustments of 
these estimates would be needed, as in the case of the 
demographic or matching methods. 

In addition, the imputation method could be used to 
allocate synthetic subnational estimates of the undercount to 
specific household records consistent with national estimates. 
Using an approach similar to that by the Census Bureau in 
assigning persons to vacant units, one could randomly 
apportion unenumerated persons by race, sex, and age to 
both enumerated and unenumerated households based on the 
synthetically derived undercount estimates for specific 
subnational geographic areas. 

2. Simplicity. The synthetic method is by far the simplest 
to apply in deriving subnational undercount estimates. This 
simplicity is a result of the fact that only one assumption 
needs to be made in order to implement it— the null 
hypothesis. Since each of the other methods— demographic, 
matching, and imputation— would require different assump- 
tions and adjustments, depending on the quality of the data 
available in different States, much more sophisticated statis- 
tical skills and training would be needed by users to apply 
these methods. 

On the other hand, the synthetic method can easily be 
applied by nonstatisticians, such as congressional aides, 
representatives of community-based organizations, and other 
research-oriented individuals and public interest groups. 

3. Timeliness. Timeliness is an essential feature for 
assessing the viability of a method for adjusting the under- 
count. Clearly, a method that required more than half a 
decade to derive subnational undercount estimates would be 
of little practical utility. 

Ideally, one would like to use a method that could adjust 
for the undercount before the official counts are turned over 
to the President and Congress. Since the Census Bureau 
customarily uses the imputation method to assign persons 

and characteristics to specific households before the census 
enumeration is completed, this approach has the advantage 
over the other procedures with regard to earliest imple- 

Subnational estimates of the undercount using the 
demographic method would not be possible until after the 
national estimates were derived by this procedure. Since it 
took about 3 years to derive the national estimates of the 
undercount in the 1970 census, one could not expect to 
obtain national estimates of the undercount using the 
demographic method before 1982. Additional years would 
be required to derive reliable estimates of the undercount by 
the demographic method for all 50 States. Somewhat similar 
protracted timing would be anticipated, using the matching 
method to derive reliable estimates of the undercount for 
each of the 50 States and the District of Columbia. It is not 
even possible to estimate the amount of time that would be 
needed to derive reliable estimates of the undercount for 
geographical areas below the State level using either the 
demographic or matching method. 

However, once national estimates of the undercount were 
derived by another procedure, the synthetic method could be 
used to immediately derive subnational estimates of the 
undercount for all jurisdictions below the national level. If, 
for example, national estimates of the undercount were 
derived by the demographic method by 1982, subnational 
estimates could be derived for all jurisdictions within this 
Nation within a matter of weeks by using the synthetic 

Since the primary objective for making adjustments below 
the national level is to minimize the inequities to States and 
local areas as a result of the undercount, a method that can 
reliably make such adjustments within a relatively short 
period of time is most desirable. 

4. Flexibility. Flexibility is another important attribute of 
a method for deriving undercount estimates. More specifi- 
cally, a method that could produce undercount estimates for 
all jurisdictions regardless of size would be highly desirable 
and useful to most elected officials and other policymakers. 

Unfortunately, most moderate and small-size jurisdictions 
are traditionally penalized for their size. Since it is easier to 
derive more reliable estimates (of the undercount, population 
projections, unemployment figures, etc.) for larger jurisdic- 
tions, smaller size areas are usually excluded. 

However, since the primary objective for adjusting for the 
census undercount is to reduce inequities to States and local 
areas, we feel that a method that does not exclude small 
areas from such adjustments should be given much greater 
weight than procedures that tend to favor larger size 
jurisdictions. For example, 27,000 (or 69 percent) of the 
39,000 units of government receiving general revenue sharing 
funds had populations under 2,500. 

The synthetic method appears to have greater flexibility 
than either the demographic or matching method in its 
ability to derive undercount estimates for all jurisdictions 


regardless of size. Another important issue regarding the 
flexibility of adjustment procedures is the ability to produce 
undercount estimates for other major ethnic groups, such as 

The extensive exploratory research conducted by the 
Census Bureau in this area concluded that it was very 
difficult to derive undercount estimates for Hispanics at the 
national level using the demographic method because of the 
absence of Hispanic origin and inconsistent Hispanic classi- 
fications in birth and death records in most States [12]. 

Theoretically, the matching method could produce under- 
count estimates for Hispanics, if reliable lists or rosters of 
Hispanic households existed or if special-purpose Hispanic 
household surveys were conducted. Unfortunately, such lists 
are yet to be found and such surveys are yet to be 
conducted. Thus, for all practical purposes, the feasibility of 
the matching method for such purposes has yet to be 

On the other hand, the imputation method appears to 
have the greatest potential for deriving undercount estimates 
for Hispanics. As a start, initial estimates might be derived 
based on the nature and numbers of imputed persons of 
Hispanic origin and on the extent of allocations of charac- 
teristics to Hispanic households. 

The ability of the synthetic method to derive subnational 
undercount estimates for Hispanics is directly constrained by 
the flexibility of the method used to derive the national 
estimates. Thus, synthetic methods that are based on 
national estimates derived by the demog r aphic method are 
not able to derive estimates of the undercount for Hispanics. 
But synthetic methods that are based on national estimates 
derived by either the imputation or matching method would 
be able to derive subnational undercount estimates for 
Hispanics. Some exploratory work by the NUL Research 
Department strongly suggests subnational undercount esti- 
mates for Hispanics can be developed by using a synthetic 
method based on national estimates derived from imputation 

5. Equity. Most discussions about the relative equity of 
different undercount adjustment methods fail to distinguish 
between two types of equity: statistical and political. 
Statistical equity results from enhancing the quality of 
statistics that may be used as individual data elements in a 
distribution formula, while political equity results from the 
actual distribution of resources according to a configuration 
of all the elements in a formula. Enhancing statistical equity 
does not necessarily lead to greater political equity. The two 
vary independently of each other and should be kept 
conceptually distinct. 

For example, adjusting population figures for the under- 
count in the general revenue sharing formula (or even in 
formulas in which population is the major data element) by 
the most reliable method possible will not ensure that there 
will be a more equitable distribution of resources to States 
and local areas that are undercount-prone. 

The distribution of resources to areas is primarily a 
political, not a technical, determination. Statisticians and 
technicians can be very influential in (a) recommending the 
data elements to be used in a distribution formula and (b) in 
determining the quality of the data elements. But politicians 
are the primary determinants of the final configuration of 
data elements in allocation formulas. This phenomenon is 
popularly referred to as "computer politics" or "politics by 

The formula is a tool for performing a very old political 
balancing act: putting the money where the needs are 
while making sure that every congressional district gets 
something. Formulas are supposed to provide a fair, 
objective distribution of Federal aid. But formula 
elements are chosen politically and seemingly minor 
changes can mean boom or bust for some recipients of 
aid. . . .Formulas are modified to accomodate some sta- 
tistical and political realities. . . . 

These compromises illustrate a tension in formula building 
between the technician's desire to draft a theoretically 
pure statistical model based on objective data and the 
politician's need to find a mathematically plausible way to 
put money where both the needs and the votes are [8] . 

Consequently, it is not fruitful for technicians to engage 
in extended debates about the amount of (political) equity 
that would result from the distribution of resources to areas 
based solely on the improvement of the quality of particular 
data elements (i.e., statistical equity). Any simulations done 
should be viewed simply as illustrative. 

It is very likely that, if some adjustment of population 
figures for the undercount was adopted by Congress to use in 
existing allocation formulas, the formulas would not remain 
the same. In all probability, the configuration of the data 
elements in existing formulas would almost certainly be 
modified to satisfy current political realities. 

Thus, the primary objective of technicians should be to 
determine the most reliable and useful method for improving 
the quality of population figures, regardless of how that data 
element may eventually be used in different allocation 
formulas. In short, the merits of one adjustment method over 
another should be assessed on statistical, and not political, 

On balance, we feel the synthetic and imputation methods 
could provide greater statistical equity to subnational juris- 
dictions than either the demographic or matching method. 
The ability of the synthetic method and imputations to 
provide estimates of the undercount for all geographical 
areas— regardless of size— is a major contributor to statistical 

Such simultaneous adjustments for all jurisdictions would 
give smaller jurisdictions the same probability of benefiting 
(or losing) as larger jurisdictions. 

It should be kept in mind that these smaller jurisdictions 
will still be penalized for their size in other ways. Adjustment 
of the census undercount only reflects the size of the 


population on Census Day, April 1, 1980. Since it is more 
difficult to derive reliable postcensal estimates of population 
for small areas, their 1980 population figures may be used 
for the entire decade as a basis for governmental allocations, 
while larger areas would be able to have updated population 
figures used in allocation formulas. Of course, small areas 
that lost population would prefer to use the 1980 figure, 
while those gaining population would want to use updated 
estimates in order to gain political equity. 

The ability of the synthetic method to derive subnational 
estimates of the undercount for ethnic groups, such as 
Hispanics, further enhances its (statistical) equity. Of course, 
this can only be accomplished after national estimates of the 
undercount are first derived through either the imputation or 
matching method. 

Based on its exhaustive assessment regarding the data 
elements in the general revenue sharing formula, the Stanford 
Research Institute concluded that equity would be enhanced 
by using the synthetic method to adjust for the census 
undercount at the State level: 

Equity of allocations to the 50 States and the District of 
Columbia can be increased by adjusting at the State level 
for underenumeration, using the national age/sex/race 
underenumeration rates prepared by the Bureau of the 
Census. If the national rates are used to adjust for 
underenumeration at the county-area and local govern- 
ment levels, equity of allocations is likely to increase for 
larger jurisdictions and to decrease for many smaller 

It is SRI's judgment, however, that on balance, the 
accuracy of the State-area population estimates would be 
improved as a result of this procedure. Although it was 
not possible to prove that increased equity would result, 
the reduction of biases due to underenumeration at the 
State level is viewed as a positive step by SRI [9] . 

However, SRI was not quite as confident about the extent of 
jquity that could be gained by using the synthetic method to 
adjust for the undercount below the State level: 

The adjustment for underenumeration at the county level 
would generate negligible changes in equity, on the 
average. Furthermore, the assumption that the national 
underenumeration rates for the 96 age, race, and sex 
categories apply uniformly across all county areas is 
difficult to defend. Even an unjustified attempt to adjust 
for underenumeration effects in the 1970 census data is 
considered by some to be better than no attempt at all. 
However, from an overall point of view, considering all 
units of government, the increase in equity is question- 
able. Additional research is needed before under- 
enumeration rates can be accurately portrayed at the local 
level. The Bureau of the Census and other organizations 
are urged to continue and accelerate the research [9] . 

It is not entirely clear whether SRI is referring only to 
political equity or to statistical equity, or to both, as a result 
of adjustments for the undercount at the State and sub-State 

levels. But its conclusions are largely based on the assump- 
tion that equity (whether political or statistical) is inversely 
related to size of jurisdictions: Greater equity will be 
achieved by adjusting for the undercount for large jurisdic- 
tions (such as States) and little equity or, possibly, greater 
inequities might result from adjustments for small areas (such 
as counties and cities). We do not think that such an 
assumption is warranted until one has developed illustrative 
allocation formulas that clearly distinguish between political 
and statistical equity. 

6. Reliability. While it is highly desirable for any method 
used to adjust for the undercount to produce undercount 
estimates that are internally consistent, timely, flexible, and 
equitable, it is essential that such estimates be reliable and 
accurate. The demographic method is widely regarded as 
producing the most reliable undercount estimates at the 
national level at present. However, the Census Bureau's own 
studies reveal that the demographic method is not a feasible 
procedure for deriving reliable estimates of the undercount 
below the national level because of severe deficiencies in vital 
statistics and migration data at the State and local levels. But 
the matching method is even less likely than the demographic 
method to produce reliable estimates of the undercount at 
any level— national, State, or local. 

This leaves either the synthetic method, imputations, or a 
combination of both as a basis for producing reliable 
subnational estimates of the undercount. There is growing 
consensus among research analysts that the synthetic method 
is a viable procedure for deriving reliable estimates of the 
undercount— at least at the State level. In fact, the National 
Commission on Employment and Unemployment Statistics 
strongly urged that the synthetic method be used to adjust 
for the undercount in governmental labor-force statistics: 

Synthetic estimates of undercoverage for each State by 
race and sex must be viewed with extreme caution. A 
major problem is that interstate migration statistics are 
not collected. But even without reliable State undercount 
estimates. State and local data could be adjusted 
according to national undercount estimates. In fact, State 
estimates are presently prepared by using the national 
second-stage ratio adjustment weights. The question, then, 
is not whether to adjust area data to national population 
controls, but which controls to use— incomplete popula- 
tion figures or figures adjusted for the census 
undercount [3] . 

Yet, synthetic adjustments of the undercount do not have 
the same reliability for areas of differing sizes. While we take 
exception to the assumption that statistical equity is 
necessarily inversely related to geographic size, we must agree 
with the assumption that reliability is inversely correlated 
with size. The reliability of estimates (whether they be 
population projections, unemployment statistics, income, or 
the census undercount, etc.) is clearly a function of the size 
of units. While synthetic estimates (or estimates derived by 
any other method, for that matter) will decline in reliability 


the smaller the jurisdiction, we do not yet know at what level 
the unreliability becomes so severe that no estimates should 
be used. 

However, in order to ensure that every jurisdiction, 
regardless of size, has an equal opportunity to have its 
population figures adjusted for the undercount, we still feel 
that synthetic adjustments should be made for all sub- 
national jurisdictions. Since most small areas will be 
penalized later by not being able to have updated postcensal 
population estimates used in governmental allocation 
formulas, these initial adjustments would minimize the 
statistical inequities to small areas. 

Moreover, we feel that such adjustments for the under- 
count are preferable to making the current assumption that 
no births, deaths, or migration occurred in most small areas 
over a period of 10 years. Strangely, there appears to be little 
interest in determining the margin of error, unreliability, or 
inequity that accrues to small areas by accepting this clearly 
fallacious assumption of no change over a decade! 


The census undercount has significant social, economic, 
political, and legal ramifications. The disproportionate under- 
count of blacks and other minorities not only severely flaws 
adequate planning for their needs, but also deprives these 
groups of equitable resources and services. Moreover, since 
population figures are used by over 100 Federal programs to 
allocate billions of dollars each year. States and local areas 
with disproportionate undercounts are deprived of their 
proper share of governmental grants-in-aid. And, since the 
census count is used as the basis for allocating representation 
in State and local policymaking bodies, as well as in the 
House of Representatives, localities with disproportionate 
undercounts are also deprived of their equitable share of 
political representation. Thus, such inequities require 
immediate redress. But what method should be used to 
correct for the undercount in population figures? 

There is widespread agreement that the demographic 
method currently produces the most reliable estimates of the 
undercount at the national level. But there is also increasing 
consensus among researchers that the synthetic method is a 
viable procedure for deriving reliable estimates of the 
undercount— at least at the State level. Our analysis strongly 
suggests that the synthetic method is the most appropriate 
procedure for deriving subnational undercount estimates that 
are internally consistent, timely, flexible, and equitable. 
Moreover, the method is relatively simple to administer and 
can be implemented by nonstatisticians. While the reliability 
of synthetic estimates for larger jurisdictions appears to be 
high, the reliability of estimates for small areas is 

We strongly urge further research into the feasibility of 
deriving national and subnational undercount estimates based 
on the Census Bureau's imputation procedures. It appears 

that the imputation method might be the most appropriate 
procedure for deriving estimates of the undercount for other 
ethnic groups, such as Hispanics, Asian Americans, and 
Native Americans. Moreover, the imputation method is 
superior to all other methods in its ability to be implemented 
earliest. Imputed or allocated figures adjusted for the census 
undercount would be incorporated in the official counts 
before they are turned over to the President and Congress. 

Key Recommendations 

1. We strongly recommend use of the synthetic method 
as a viable procedure for deriving estimates of the 
undercount for States and local areas, regardless of 

2. We further urge that synthetic estimates of the 
undercount for States and local areas be used in 
governmental grants-in-aid distribution formulas to 
those areas. 

3. We urge serious consideration be given to using 
imputed figures adjusted for the census undercount in 
the official census counts that are turned over to the 
President and used as a basis for congressional appor- 
tionment as well as for governmental funding 

4. We urge that further research be conducted into the 
feasibility of deriving estimates of the undercount for 
other ethnic groups, such as Hispanics. While such 
research is underway, we recommend that the national 
undercount rates for blacks be used for Hispanics in 
order to permit subnational estimates of the under- 
count to be derived for Hispanics based on the 
synthetic method. 3 

3 We feel that the imputation method is one of the most viable 
procedures for deriving estimates of the undercount for nonblack 
minority groups— especially Hispanics and Asian Americans. In fact, 
estimates of the undercount for each of these groups could be derived 
at the national level by aggregating the figures relating to the percent 
of each of these groups that were imputed at the local level by sex 
and age. Then the synthetic method could be used to derive estimates 
of the undercount for each of these groups for various units below the 
national level. 


1. Heer, David. Social Statistics and the City. Cambridge, 
Mass.: Joint Center for Urban Studies, 1968. 

2. Hill, Robert B., and Steffes, Robert B. "Estimating the 
1970 Census Undercount for States and Local Areas." 
Washington, D.C.: National Urban League Research 
Department, July 1973. Reprinted in Urban League 
Review, Vol. 1, No. 2 (Fall 1975) 36-45. 

3. National Commission on Employment and Unemploy- 
ment Statistics. Counting the Labor Force. Washington, 
D.C.: U.S. Government Printing Office, 1979, 141-143. 


4. National Research Council. America's Uncounted 
People. Report of the Advisory Committee on Problems 
of Census Enumeration. Washington, D.C.: National 
Academy of Sciences, 1972. 

5. Counting the People in 1980: An Appraisal of 

Census Plans. Report of the Panel on Decennial Census 
Plans. Washington, D.C.: National Academy of Sciences, 
1978. See especially dissent by Eddie Williams and 
Leobardo Estrada. Also, 98-1 02 and 106-108. 

6. Savage, I. Richard, and Windham, Bernard M. "The 
Importance of Bias Removal in Official Use of U.S. 
Census Counts." FSU Statistics Report, No. 265. Talla- 
hassee, Fla:- Florida State University, Juiy 1973. 

7. Siegel, Jacob S. "Estimates of Coverage of the Popula- 
tion by Sex, Race, and Age in the 1970 Census." 
Washington, D.C.: Bureau of the Census, April 25, 1973. 

8. Stanfield, Rochelle L. "Playing Computer Politics with 
Local Aid Formulas." National Journal, December 9, 
1978. Reprinted in National Commission on Employ- 
ment and Unemployment Statistics, [3], appendix, vol. 
Ill, 332-341. Thus, it is erroneous to assume that using 
population figures adjusted for the census undercount 
would not result in some change in the original 
allocation formulas. It is more than likely, if past 
experience is any indication, that some "hold-harmless" 
provisions would be incorporated to minimize any sharp 
changes in financial allocations to local areas at any one 
time due to using adjusted population figures. 

9. Stanford Research Institute. General Revenue Sharing 
Data Study: Executive Summary, Vol. 1. Washington, 
D.C.: Office of Revenue Sharing, Department of the 
Treasury, August 1974. 

10. Strauss, Robert P., and Harkins, Peter B. 1970 Census 
Undercount and Revenue Sharing: Effect on A/locations 
in New Jersey and Virginia. Washington, D.C.: Joint 
Center for Political Studies, June 1974. 

11. U.S. Department of Commerce, Bureau of the Census. 
"Coverage of Population in the 1970 Census and Some 
Implications for Public Programs." Current Population 
Reports, Series P-23, No. 56, August 1975. 

12. "Coverage of the Hispanic Population of the U.S. 

in the 1970 Census: A Methodological Analysis." 
Current Population Reports, Series P-23, No. 82, 

13. "Developmental Estimates of the Coverage of the 

Population of States in the 1970 Census: Demographic 
Analysis." Current Population Reports, Series P-12, No. 
65, December 1977. 

14. "General Social and Economic Characteristics: 

U.S. Summary." 1970 Census of Population. Washington, 

D.C.: U.S. Government Printing Office, 1972. See 
allocation tables C-1 to C-4 in appendix C. 

15. "The Social and Economic Status of The Black 

Population in the United States: An Historical View, 
1790-1978." Current Population Reports, Series P-20, 
No. 80, 1979,9-10. 

16. U.S. House of Representatives, Committee on Post 
Office and Civil Service, Subcommittee on Census and 
Statistics. "Accuracy of 1970 Census Enumeration and 
Related Matters." Washington, D.C.: U.S. Government 
Printing Office, September 15, 16, 22-24, 29 and 30, 
1970. Illinois Council for Black Studies. "Black People 
and the 1980 Census: A Conference on the Population 
Undercount." Proceedings of a conference held 
November 30-December 1, 1979, Chicago, III. 

17. U.S. House of Representatives, Committee on Post 
Office and Civil Service, Subcommittee on Census and 
Population. "The Use of Population Data in Federal 
Assistance Programs." Washington, D.C.: U.S. Govern- 
ment Printing Office, December 29, 1978. 


Joseph Waksberg 


Although the adjustment of census counts by the Census 
Bureau's Director might set a precedent that may later be 
regretted, the National Academy of Sciences' panel has 
concluded that an adjustment could be made with only 
minor danger of having the figures manipulated for political 
purposes or having them distrusted and repudiated by the 
public. The key to this is an early decision and announce- 
ment of the procedure to be followed in adjusting the data. 
The procedure should be described in sufficient detail so as 
to leave no possibility of the perception that the figures were 
"manipulated" after the census counts were seen. As a 
consequence, the arguments for adjustment seem convincing. 
However, there is still a question of how to adjust. 

Synthetic estimation is the best way of proceeding. In 
arguing for this procedure, it was not necessary for Dr. Hill 
to assume that undercoverage rates are equivalent across 
locales, however. For the procedures to be desirable, it is not 
necessary for the undercoverage rates to be equivalent, only 
that acting as if they are equivalent would produce a fairer 
distribution of funds. Although Dr. Hill proposed using 
national estimates for race, sex, and age for the synthetic 
estimates, it is possible to base synthetic adjustment on other 
kinds of areas (regions, States) or other kinds of subgroups, 
if acceptably accurate estimates of undercoverage of the 
subgroups were available. 

It is useful to examine the criteria leading to Dr. Hill's 
preference for a synthetic method of adjustment— namely, 
internal consistency, simplicity, timeliness, flexibility, 
equity, and reliability. These are important attributes for any 
adjustment method (although perhaps equity and reliability 
are not separable), but two other criteria should be added: 
(1 ) The adjustment method should have a high probability of 
public acceptance (giving further emphasis to the idea of 
simplicity), and (2) the adjustment should be nontrivial. This 
last criterion is related to whether an adjustment should be 
made at all. Cutoff values for the undercount rate below 
which no adjustment would be made should be established for 
the overall undercount rate and/or the undercount rates for 
subgroups. An adjustment should not be made unless it 
makes an important difference in some areas at least. 

Equity is central to the issue of adjustment. The purpose 
of adjustment is to improve equity (statistical, not political), 
and the Census Bureau's role is to see to it that the intent of 
legislation or executive directives is carried out as faithfully 
as possible. 

While the Canadian procedure deals with estimates of 
undercount based on sample surveys, not demographic 

analysis, and is not immediately applicable to the types of 
synthetic estimates proposed by Dr. Hill, the principle of 
establishing a criterion based on overall performance of 
technique is important and transferable. Accepting the 
mean-squared error procedure or similar measures implies 
that the Bureau should not be distracted by the fact that 
errors in some localities may be increased by an adjustment 
technique. It is unrealistic to require improvement in all 
areas. Furthermore, even the fact that there may be some 
classes of areas that are adversely affected in a similar way 
should not be a deterrent to accepting an adjustment 
technique if it improves overall equity, that is, reduces the 
measure of inequity. 

If one assumes the national demographic estimates of 
undercount are virtually without error (which is reasonable 
except for Hispanics), tnen it looks like the use of synthetic 
estimates will reduce inequity unless there is a very strange 
distribution among areas of undercoverage rates within 
age-sex groups, which seems unlikely to occur in practice. 

Dr. Hill's preference for use of national estimates, rather 
than regional or State estimates, for use in synthetic 
estimation is based on the criteria of timeliness and 
simplicity. These are important attributes and should be 
abandoned only if there is clear and definitive evidence that 
the Census Bureau could produce sufficiently reliable sub- 
national estimates to compensate for the loss of time and 
simplicity. The careful itemization of assumptions necessary 
to produce Mr. Siegel's State demographic estimates made 
quite clear the uncertainty of State estimates. Similarly, past 
U.S. experience with dual-measurement systems based on 
surveys or reverse record checks is not very encouraging. 

Because of the uncertainty at this time in the way the 
Bureau would produce State estimates and the fact that the 
methodology is still quasi-experimental, fairly firm rules of 
how State estimates would be derived could probably not be 
established in advance, as recommended by Dr. Keyfitz. This 
ability to state rather precisely in advance the methodology 
to be used is a crucial requirement and seems to be another 
reason to prefer Dr. Hill's approach, rather than other 

The synthetic estimate proposed by Dr. Hill is based on 
race-sex-age undercount rates. Although sex and age do not 
affect the results significantly, and while using these 
characteristics goes against the simplicity criterion, an 
adjustment using race-sex-age rates is preferable. The 
procedure to be used will have to be clearly explained to the 
public; if it is well known that fairly accurate estimates of 



undercount exist for sex and age, it would be difficult 
to explain why cruder classifications are used than are 

There are two points to consider in the decision about 
whether to adjust through imputation or through other 
methods: (1 ) The technical issue of what is the easy way and 
what is the best way to adjust, and (2) the kinds of effects 
the adjustment methods would have on the statistics that 
would be used in the formulas. 

In considering two ways of adjusting the undercount— 
(1 ) adjusting the population counts only, leaving the 
characteristics unchanged and (2) actually imputed by 

duplicating persons at random— which of the two is chosen is 
likely to make a significant difference in the per capita 
income figures. The lower per capita income of the black 
population will be reflected in different ways for the two 
procedures in the per capita income figures produced for 
cities. States, etc. This should be considered as part of the 
issue of how to adjust. 

In closing, all of the discussion thus far assumes that it 
will be the Census Bureau estimates of undercount that will 
be used for adjustment. This is a reasonable approach. No 
one has questioned the Bureau's competence in this area, nor 
its objectivity or integrity. 


Four questions concerning an adjustment were raised to 
consider: (1) Should an adjustment be made? (2) What 
agency should make the adjustments? (3) How is knowledge 
about undercount to be obtained? and (4) How should an 
adjustment be made? The group should express its opinion 
on the questions of whether or not to adjust. Discussion first 
concentrated on question 2, and it was generally agreed that 
any adjustments should be made by the Census Bureau. 
Knowledge of the undercount must be obtained outside the 
Bureau because the Bureau cannot do much better in 
reducing the undercount with any expenditure of money. 
Therefore, the methods must tend toward demographic 
analysis; a postenumeration survey, in which missing persons 
would not be found in any large numbers; or an adminis- 
trative records match. Thus, it will be very hard to obtain 
data and any information obtained would be subjective. The 
Census Bureau should decide on a method of adjustment. It 
would be a mistake to freeze the method of estimation to, 
say, synthetic, as better methods will be developed. 

A distinction made between the short-term and long-term 
objectives was suggested, however. Short term means what 
will be done in the 1980 census, while long term means 1990 
and beyond. If the view of the National Academy of Sciences 
is taken that it is necessary to describe the procedure in 
advance of the census counts, then there is not much time 
for a decision to be made— 3 to 4 months. 

It was made clear, however, that the Census Bureau was 
not endorsing the use of a synthetic method by using it as an 
illustration to judge the kinds of impacts that implementing 
various kinds of public programs would have. In particular, 
the National Commission on Employment and Unemploy- 
ment Statistics never endorsed the synthetic method, 
although it did call for adjustment of the CPS for purposes of 
measuring unemployment. While a synthetic method may be 
used to measure the heart and substance of a broad 
phenomenon like health, there are rare instances where it was 
used to measure error. 

The discussion seems limited to simple schedules for 
making an adjustment; the possibility of using synthetic 
estimation when there are data available for that, about 
1981, then implementing a broader technique and data in 
1983 or 1984 to make another adjustment should be 

considered. There is no evidence that public acceptance and 
understanding of the methodology are needed. 

It was also argued that, given the constraints of time, the 
Census Bureau may have to use the most simple method. As 
soon as the postcensal estimates are published, the counts for 
198C have become irrelevant. For published postcensal 
estimates, the Census Bureau uses a series of estimates with 
lists of assumptions that all seem to be most feasible and 
sensible. In that situation, the difficulties with using rates of 
undercount seem much less problematic relative to the 
method of estimation being used, when compared with the 
problems of using undercount adjustment relative to the 
census counts. Of course, the adjustments should be 
incorporated in the postcensal estimates. The objection to 
adjustment because there would be two sets of census figures 
created would not exist if the adjustment is only made to the 
postcensal estimates. Also, while the main objection to 
imputation is one of constitutionality, it also removes the 
urgency of trying to get all individuals to return census 
questionnaires and the Census Bureau should be praised for 
backing away from making more and more imputations. 

Several additional points were stressed: 

1. Estimates of the Hispanic population range from 12 
million to 20 million persons. There should be an 
adjustment to Hispanic counts because the language 
problems and undocumented workers may cause large 
undercounts. Use of the black undercount rates to 
adjust for Hispanics is the best method currently 

2. The undercount for demographic subgroups should not 
be assumed to be the same in all locales, this would be 
counterintuitive. In fact, an important objective of the 
Census Bureau's evaluation should be to look at 
variability in rates of undercounts for race-sex-age 
subgroups in different situations. 

3. The ratio of undercounts to imputations may vary and 
more research should be done into the variability of 
imputation and overall undercount by different race 
groups. This may yield some way of projecting what 
the variability will be in 1980 for use as a provisional 
kind of measure. 


The Impact of an Adjustmenttothe 1980 
Census on Congressional and 
Legislative Reapportionment 

Carl P. Carlucci 

New York State Assembly 
Reapportionment Task Force 


In 1962 the U.S. Supreme Court opened the reappor- 
tionment process to the scrutiny of the Federal courts 
[Baker v. Can, 369 U.S. 186 (1962)]. The Court had 
previously denied relief in cases challenging redistricting 
plans. In 1946 Justice Frankfurter argued that "courts ought 
not to enter this political thicket" [Colegrove v. Green, 
328 U.S. 549 (1946)] . By 1964 the mood of the Court had 
changed and in the landmark cases of Wesberry v. Sanders, 
376 U.S. 1 (1964), and Reynolds v. Sims, 377 U.S. 533, 
reh. den. 379 U.S. 870 (1964), the Supreme Court entered 
the "political thicket." The nature of the Court's rulings in 
these cases was such that the decennial census could even- 
tually become a major factor in reapportionment and 

The rulings in both of these cases rested on the issue of 
equal representation. Prior to these rulings, and the many 
cases that followed, many States had not reapportioned 
despite population changes. Only about one-half of the 
States reapportioned their legislatures after the 1950 census 
.and many had not reapportioned for decades before [4] . The 
Supreme Court ruled in Wesberry v. Sanders that Article I, 
section 2, of the Constitution required that "as nearly as is 
practicable one man's vote in a congressional election is to be 
worth as much as another's." 

In Reynolds v. Sims, the Court held that: 

The Equal Protection clause requires that a State make an 
honest and good faith effort to construct districts, in both 
houses of its legislature, as nearly of equal population as 
is practicable. 

Both the Congress and State legislature were now firmly 
tied to a population standard. 

In cases that followed, the Court refined the population 
standard. The Court interpreted "as nearly of equal popula- 
tion as is practicable" as near mathematical equity for 
congressional districts; less than 1 percent is the goal, and up 
to a 10-percent deviation among State legislative districts. 
The 20-plus major Supreme Court cases that followed 
Baker v. Can, 369, U.S. 186 (1962), dealt with such topics 
as the difference between legislative and congressional 
districts, how strict the population equality standard should 
be in various circumstances, and how much flexibility States 
should be allowed in drawing districts. Populations, average 
populations, population variances, population differentials, 

population deviations, and population ratios are repeatedly 
discussed in the cases, analyses, articles, and textbooks that 
deal with reapportionment and redistricting. Court cases 
commonly discussed the population variance or differential 
of proposed districting plans, comparing calculations of 
overall population variance and rationale with court-set 
standards for maximum deviation. Texts on the subject 
regularly contain a State-by-State analysis of district sizes 
and average and/or maximum percentage deviation in 
population per seat. 

There has been relatively little discussion of the role of 
the census in this process. In Professor Robert Dixon's 
1968 work. Democratic Representation , Reapportionment in 
Law and Politics, the census is mentioned as part of a num- 
bers game where plaintiffs use census data to demonstrate 
"numerical disparities" [2] . Good government groups rely 
extensively on population statistics to "prove" that their 
proposals for "independent reapportionment commissions" 
are needed [8] . 

The most common discussions of reapportionment and 
redistricting focus on questions of legal rulings and court 
intent. State legislatures and their staffs are even now 
pondering the question, what will the courts require or 
permit in the construction of legislative and congressional 
districting? Speculation revolves around how strictly the 
courts will enforce the population equality rule and how 
much population variance, maximum and average, will be 
tolerated [5] . Population data are crucial to these discussions, 
but the census and its accuracy have not been raised as 

The Census of Reapportionment 

Where does the census emerge as an issue? As we get 
closer to the taking of the census in April 1980, the Bureau's 
estimates and projections have generated interest in both the 
taking of the census and its use in reapportionment/redis- 
tricting. In July of 1979, the Bureau issued a set of 
population estimates for 1978 and the "reapportionment" 
of Congress that would result if their figures were used as 
the apportionment base. 

According to Article I, section 2 of the U.S. Constitution: 

Representatives and direct Taxes shall be apportioned 
among the several States which may be included in this 
Union, according to their respective numbers .... The 
actual Enumeration shall be made within three Years 
after the first Meeting of the Congress of the United 



States, and within every subsequent Term of ten Years, 
in such Manner as they shall by Law direct. 

Under the latest provisions of title 13 of the United States 
Code, the Secretary of Commerce is directed to take a 
census of population as of April 1st of every 10th year, 
starting in 1980, and to report the results of this "decennial 
census" to the President within 9 months of its taking, 
for use in apportionment of Representatives in Congress. 
Title 2 of the United States Code specifies the procedure to 
be used in apportionment. The Census Bureau is responsible 
for the preparation of a report including the population of 
each State and 

the number of Representatives to which each State would 
be entitled under an apportionment of the then existing 
number of Representatives by the methods known as the 
method of equal proportions, no State to receive less than 
one member. 1 

The Bureau's provisional estimates of population and 

congressional districts were presented in a news release titled, 

"New Census Population Estimates Indicate Extensive 

Congressional Redistricting After 1980" [13]. Despite the 

warnings that "redistricting " (i.e., the actual drawing of 

new district boundaries) cannot be forecast with certainty, 

the press and the political world found the news irresistible. 

As the Bureau pointed out, the actual redistricting is very 

sensitive to the detail and accuracy of the population base, 

and some States will have to draw new district lines to ensure 

that all districts meet court-set standards of population 

equality, but the projected reapportionment of seats among 

14 States could not be mitigated. 

The Bureau's estimates showed that eight States would 

probably gain seats in the U.S. House and that six would lose 

as a result of the reapportionment following the 1980 census. 

The reaction to such an announcement was predictable. 

Those representing States predicted to gain seats were elated, 

those representing the losers were understandably agitated. 

The implications are more than just personal problems or 

opportunities for current members of Congress. 

In 1964, in a packet of nine cases headed by Reynolds v. 
Sims, the "one-man-one-vote" rule as it applied to legislative 
bodies was announced. By the 1966 election, virtually every 
congressional and legislative seat had been affected by the 
judicial orders for new districting plans [2] . Coincident with 
the Goldwater presidential defeat of 1964 had been the loss 
of 541 Republican-held seats in State legislatures [2] . After 
the 1964 elections, Republicans only controlled the legisla- 
tures in six States and one house in nine [2] . 

In 1979, the Democrats are still in control of more than 30 
State legislatures. The legislative elections of 1980 will be 
the last chance for either party to gain control before the 

redrawing of congressional district boundaries following the 
1980 census. Changes in the number of congressional seats can 
signal changes which could impact on control of a State's 
legislature. Nine of the Nation's ten most populous States 
will probably gain or lose at least one congressional seat. 

The Undercount and Reapportionment 

Normal political concerns with reapportionment and 
redistricting would be aggravated by a census undercount, 
its measurement, and any possible adjustments. Since the 
publication of America's Uncounted People, concern regard- 
ing an undercount in 1980 has focused on measuring and 
adjusting for such an undercount. Most of the discussion of 
the impact of an undercount and potential adjustments has 
dealt with the apportionment of public funds. Relatively 
little has been said about the impact of an undercount on 
the reapportionment/redistricting process. The work of the 
Bureau and the testimony delivered before congressional 
committees have documented in some detail the shifts in 
the distribution in funds that would result from adjusting 
for the 1970 census undercount. The Bureau estimates that 
at most two congressional seats and four States could have 
been affected by adjustments for the 1970 undercount [11] . 
The Bureau's analysis of the impact of the 1970 undercount 
considers the possibility of shifts in State legislative districts 
and city council districts as small [9] . 

The Bureau's analysis of the impact of an undercount and 
adjustment on representation has been limited. The question 
of legal requirements to adjust for an undercount in 1980 has 
not been addressed. The Bureau has focused on adjustments 
to the census for use in the many Federal programs that 
involve disbursement of funds to States on the basis of popu- 
lation because this has been the focus of recent legislation. 2 
If the census is adjusted, how could a more accurate set of 
counts be ignored when congressional seats are apportioned? 
If an undercount is certain in 1980, as the Bureau's studies 
seem to indicate, must some adjustments be made if the 
courts' population equality standards are to be enforced? 
These are issues that have not been directly addressed by the 
courts or by reapportionment scholars. The impact of such 
issues is small, given the Bureau's focus on nationwide 
impact; but where such issues are important, most likely 
in the more populous States and in urban areas, there will 
be court challenges and there will be concerned State and 
congressional representatives. 


A census is defined as a count of population. It is gener- 
ally assumed by the public that such a count is accurate, and 

'This number was specified as 435 in the Apportionment Act of 
Aug. 8, 1911 (37 Stat. 13). 

2 S. 1606, introduced by Mr. Moynihan July 31 , 1979, directs the 
Secretary of Commerce, "In conducting the census. . .(to) adjust the 
population figures. . .to correct for undercounting." 


that no one is omitted or counted twice. This assumption is 
not realistic. In every census there is probably some error in 
the enumeration. The Bureau's analysis of the 1970 census 
count has provided an estimated undercount. Due to the size 
of that undercount and the characteristics of its distribution, 
the effect of that undercount fell disproportionately on 
those in heavily urbanized areas, those in minority ethnic 
groups, those in poverty areas, and those who represent the 
aforementioned. The Bureau's analysis and description of 
which groups were undercounted was clear enough for 
everyone to understand and use, and forceful enough to 
attract the attention of the press and the political commu- 
nity. With the 1980 count coming, people seem to be more 
aware of the possibility of an undercount. 

The impact of an undercount is something that is not as 
well understood. As was mentioned, the concentration has 
been on the impact on the distribution of public funds. This 
is understandable, given the facts. Funding is by its nature 
quantifiable and easily subject to statistical analysis and 
adjustment schemes. The distribution of funds goes on con- 
tinuously, as does population change, and funding changes 
can be made at various levels and in small units. Apportion- 
ment is discontinuous; it takes place at only a few levels, 
is handled in rather large indivisible blocks (seats), and has 
been subject to change only by court order. 

Apportionment of Seats 

Is the impact of an undercount, or an adjustment, on 
reapportionment significant? In 1975, Jacob Siegel of the 
Population Division of the Bureau of the Census analyzed 
the effect of an undercount on representation in legislative 
bodies at various levels of government [9] . Mr. Siegel 
constructed a table titled, "Minimal Model Values for the 
Number of Contiguous Legislative Districts and the Total 
Contiguous Population Required to Establish an Additional 
District, According to the Underenumeration Rate by Race, 
the Population Distribution by Race, and the Average 
Population Per District," which allows one to easily estimate 
whether or not a geographic area would qualify for an addi- 
tional congressional or legislative seat as a result of a 
correction of the census undercount [9] . 

On a nationwide level there is probably little impact of an 
undercount on representation, according to Siegel's table. If 
the general methodology used to produce this table is applied 
to a specific area, the results may be significant. The signifi- 
cance of such results will, of course, be increased if the 
individual interpreting those results has a special interest in 
that area and makes some assumptions regarding under- 
enumeration rates in that area. 

Using New York City as an example, Siegel's table would 
indicate that an adjustment for the undercount in 1970 
would not produce additional representation. Using New 
York City's 1970 population of 7.894 million and its black 
population as approximately 22 percent does not produce 

another congressional seat to add to the city's existing 18. 
According to the table, with a 25-percent black population, 
a combination of 28 contiguous districts and a population 
of 13.141 million would be required to produce an addi- 
tional district, if average underenumeration rates are applied 
(1.9 percent for whites and 7.7 percent for blacks). If high 
underenumeration rates are applied (2.2 percent for whites 
and 9.2 percent for blacks), the city would need 24 contig- 
uous districts containing a population of 11.020 million to 
yield another congressional district. 

Applying a simple assumption and some additional infor- 
mation to this table produces a different result. New York 
City has large populations of minorities other than blacks. 
The Hispanic population is the largest minority population 
next to the black population. The representatives of the 
Hispanic community have speculated that the under- 
enumeration rate of Hispanics is equal to or greater than that 
estimated for blacks. The Bureau has estimated that "the 
coverage level of the Hispanic population in 1970 falls 
between that of the white and black populations" [12] . 

Assuming that all groups other than whites were under- 
enumerated at a rate equal to the underenumeration of 
blacks, an additional congressional seat is produced for New 
York City, a result consistent with the city's own estimates 
of the 1970 undercount. With a population 32 percent other 
than white, the high underenumeration rate portion of 
Siegel's table would cause one to speculate that an additional 
congressional seat for New York City would be possible. 
If a calculation is made on a 7.894 million population with 
a 32- and 68-percent nonwhite/white split, at the high 
underenumeration rates the result is an undercount of 
approximately 351,000. Allowing for variations in the size 
of districts and the priorities produced under the method 
of "equal proportions" used to determine the number of 
Congressmen apportioned to each State, 351,000 would very 
likely produce another congressional seat. 3 Given the popu- 
lation ratios of New York State's Senate and Assembly 
seats (304,023 and 121,609, respectively), at least three 
additional legislative seats could be allocated to New York 

These calculations are presented as the type of analysis 
the States and municipalities will develop in responding to 
predictions of an undercount. The Bureau's estimate that 
only two congressional seats would be shifted (affecting four 
States; New York was not one) and that the impact on State 
legislative representation would be small is not what the 
individual States and municipalities will project. In the case 
of State legislatures, the statewide population count as well 
as small area counts can affect both the number of seats in 

3 "Under the method of 'equal proportions,' the method used to 
determine the number of Congressmen from each State in the U.S. 
House of Representatives, the shift in the population of a State 
required to produce a change in the State's representation may be 
merely a few hundred persons or a few hundred thousand persons, 
depending on the precise populations of all States." (See ref. 8, p. 


a house and the drawing of district boundaries. In New York 
State, the size of the Senate is not fixed and can vary with 
population changes. 4 The "block on border" rule in New 
York requires that the population of each individual city 
block be checked to produce the most precise district 
population equality, which greatly affects the drawing of 
State legislative districts. 

Technical Problems 

The Bureau's analysis of the undercount and its impact 
on representation surfaces a number of technical issues that 
will create potential problems. While those who develop 
legislation can simply stipulate that the Bureau employ the 
best available methodology to correct for undercounting, 
the Bureau's task is not so easy and the result is unlikely to 
be universally accepted as the "best." The Bureau's problem 
is well summarized in the introduction to one of its many 
technical reports. 

Establishing the exact or even the approximate extent 
of underenumeration is much more difficult than dis- 
covering that such a problem exists [10]. 

Subnational estimates of undercount and resulting adjust- 
ments are likely to be the Bureau's largest problem. While the 
apportionment of Congress could be carried out with only 
national and State population counts, the drawing of district 
boundaries requires counts for small geographic areas. A 
1-percent standard for congressional districts of 500,000 
makes a small area of 5,000 persons significant. These figures 
must, of course, be precise (ranges will not do) and agree 
with State totals. State representatives will assume and 
expect the same accuracy, reliability, and confidence the 
Bureau indicates for its national and State totals. While the 
Bureau appears well aware of the public's mistaken presump- 
tion that such counts and adjustments will easily follow from 
the methodology for national estimates coverage, the Bureau 
should be aware that State laws may require such for redis- 
ricting purposes. An example of such need is New York 
State's rejection of the use of estimates for small areas based 
on sample data. In New York State in 1949, the Legislature's 
Reapportionment Committee contacted the Bureau regarding 
a method of taking a census of citizen population of the 
State. The Bureau offered to determine citizen population 
based on a 20-percent sample, but the proposal was rejected 
and an actual census of citizens, with city and town block 
counts, was contracted for and delivered in 1951 [3] . 

The lack of uniformity in underenumeration rates will 
prohibit the blanket use of such rates in making adjustments 
to small geographic areas. As Siegel correctly pointed out in 

Under an apportionment formula, if the apportionment is 

4 New York State Constitution, Article III, section 4. 

based entirely or primarily on population and if the rate 
of underenumeration is the same from area to area, the 
results of such apportionment would be essentially 
unaffected by any undercoverage [12]. 

Mr. Siegel points out that rates are not uniform and are 
higher for minority populations and probably for urban 
areas. There appears to be a general awareness of this lack of 
uniformity and the need for adjustments to be tailored to 
local characteristics. Without this tailoring, adjustments 
would be useless from the point of view of legislators and the 
courts for use in reapportionment. 

The stability of estimates of underenumeration and 
adjustments is likely to present problems. In the Bureau's 
efforts to develop an expected "true" population, estimates 
of coverage have been issued a number of times. Discussions 
of various techniques and updates of estimates in 1974 dealt 
with changes of up to ±0.1 percent in undercount rates. 
While this is a relatively small amount, especially when 
dealing with a population of over 200 million, it still repre- 
sents 0.2 million people. In 1977, the Bureau offered alterna- 
tive estimates of net underenumeration at the regional and 
State levels [11]. These estimates were sufficiently different 
to produce different changes in representation in different 
States when the data were applied to the equal-proportions 
methodology. One series of estimates produced a change of 
one seat between Tennessee and Oklahoma and another 
produced changes of two seats involving California, Texas, 
Ohio, and Oklahoma. While this change may be considered 
small, at what point in time does the country accept 
estimates as final for purposes of apportionment? 

Even if an adjustment of the 1980 census is agreed upon 
for reapportionment of Congress at the national level, and it 
stands in the courts, the ability of the Bureau to produce 
counts in sufficient detail to meet the technical requirements 
for drawing districts is in doubt. The question of whether 
data could be produced to meet the legally required technical 
aspects of State legislative redistricting can only be answered 
by surveying the individual States. My reading of summaries 
of constitutional requirements of various States shows no 
State requiring block level data, but many States have strict 
equality and timing requirements. Reviewing less than a half 
dozen constitutions directly showed that only New York 
required census counts on the block level, which would be 
an impossibility given the present state of the art in 
producing adjustments. 

Court Cases 

In analyzing questions of reapportionment and redis- 
tricting, the discussion has depended on court cases and 
population statistics. Since there is no body of law dealing 
specifically with the need for or use of estimates or adjust- 
ments by the Bureau in congressional apportionment, I have 
looked for related cases that might set a precedent or be 


indicative of the courts' leanings. Cases were selected which 
dealt with the questions: 

Is it legally necessary to have an adjustment if an under- 
count is certain? 

Would the courts require new apportionment or redis- 
ricting plans if an adjustment was available? 

Three cases were found that dealt indirectly with these 
questions. While I feel there may be other cases that I have 
not located that also deal indirectly with these questions, the 
three cases presented probably offer as much insight into the 
possible actions of the courts in the eighties as will any 

The three cases that follow all tend to discount the value 
of census adjustments and refinements of population counts 
in the reapportionment process. In the case of Asbury Park 
Press, Inc. v. Woolley, 33 N.J. 1 (1960), the New Jersey 
Supreme Court heard a taxpayer's action seeking declaratory 
judgment that the 1941 New Jersey apportionment law was 
unconstitutional in light of 1950 census figures. The court 
ruled that since the New Jersey 1941 apportionment had not 
been challenged in 8 years and since the 1960 census figures 
would be available before the 1961 election, the court would 
defer declaration to allow the legislature to reapportion 
itself. In addition, the court stated that the results of pre- 
liminary counts customarily released by the Census Bureau 
would be sufficiently accurate for the New Jersey General 
Assembly to proceed to redistrict in an intelligent manner, 
provided the counts included data broken down by counties, 
towns, and wards. The New Jersey court cited the earlier 
Connecticut case Cahill v. Leopold, 141 Conn. 1 (1954), in 
which the Connecticut Supreme Court ruled that: 

It is not necessary that the information be published in 
book form before it becomes officially available. Indeed, 
there is not even a constitutional provision requiring the 
figures to be final. While final tabulations tend to greater 
exactitude than those previously computed, there is no 
need for the precision of perfection. The results of the 
preliminary counts customarily released by the Census 
Bureau, as happened in the case at bar, are ample to 
afford sufficiently accurate data for an Assembly to pro- 
ceed to redistrict in an intelligent manner, provided the 
counts have been broken down into counties, towns, and 
wards. (103 A. 2d, at pages 823-824) 

In the case of Koziol v. Burkhardt, 51 N.J. 412 (1968), 
the court ruled that the legislature was not required to use a 
1967 population estimate to make a 1968 amendment to a 
1966 redistricting act based on 1960 census data. The court 
ruled the legislature was free to use the 1960 census on 
which the act was based, but did not consider whether the 
legislature could use such interim estimates. 

In the session of November 8, 1976, the U.S. Supreme 

Court affirmed the judgment of the U.S. District Court in 
the case of Republican Party of Shelby County, Tenn. v. 
Dixon, USDC W. Tenn, 3/25/76. In a summary action the 
Supreme Court ruled: 

In determining validity of congressional districting Federal 
district court is not confined as matter of law to 1970 
Federal census figures, but may consider reliable popula- 
tion estimates made since then; Federal decennial census 
figures will, however, be controlling, unless there is "clear, 
cogent, and convincing evidence" that they are no longer 
valid and that other figures are valid; neither post-1970 
Census population figures prepared by National Planning 
Data Corporation nor "provisional estimates" of Census 
Bureau meet such test; reapportionment of sixth, seventh, 
and eighth congressional districts of Tennessee is ordered 
on basis of 1970 Federal census figures [7] . 

If I can be allowed to interpret these cases and produce 
some general conclusions, they would be as follows: 

1. An otherwise valid district plan need not be invalidated 
because more recent population data are available. 

Asbury Park Press, Inc. v. Woolley, 
33 N.J. 1 (1960) 

This is contingent on the assumption that a new 
districting plan will be generated as a matter of course 
when the next decennial census becomes available. 

2. Decennial census counts for use in reapportionment 
need not be those eventually published as the final 

Asbury Park Press, Inc. v. Woolley, 

State constitutions don't generally specify "final" 
census counts, accepting instead that when the census 
becomes "officially" available for public use, it is ac- 
curate enough. 

3. Population counts must be broken down in municipal 
units small enough to meet requirements for drawing 
of district boundaries. 

Asbury Park Press, Inc. v. Woolley, 

4. Adjustments to redistricting plans based on census data 
can be made based on that census data and need not 
be based on more recent estimated population data. 

Koziol v. Burkhardt, 51 N.J. 412 (1968) 


5. In determining the validity of a districting plan, the 
court is not confined to decennial census figures. 

Republican Party of Shelby County, 
Tenn. v. Dixon, USDC W. Tenn., 

6. Decennial census figures are to be used unless clearly 
no longer valid, and better figures are available. 

Republican Party of Shelby County, 
Tenn. v. Dixon, supra. 

7. Projections from a decennial base and the Census 
Bureau's "provisional estimates" of population are 
not clearly better than decennial census figures. 

Republican Party of Shelby County, 
Tenn. v. Dixon, supra. 

To summarize these conclusions, I would say that the 
courts would not require adjustment, or redrawing, of a 
redistricting plan that had been validated by the court or 
had not been challenged because of changes in population 
as evidenced by the types of estimates, updates, or adjust- 
ments to the census which the Bureau currently issues. 

If an undercount is certain, but it is agreed that it cannot 
adequately be measured or adjusted, the courts would have 
no choice but to accept the census as taken. If there is a 
general agreement on the measurement of the undercount, 
and an adjustment for use in distribution of funds is pro- 
duced and accepted by the Federal and State governments, 
the court action could be to accept such figures as valid and 
require that they be used in reapportionment or that they 
evidence violation of the one-man-one-vote principle and 
invalidate existing plans. 

This does not answer the questions posed in the search for 
cases. The only thing that is certain is that the courts have 
kept their options open. 


The Census Bureau is currently considering whether or 
not it should adjust the 1980 census count by allocation of 
the estimated uncounted population. The Bureau is also 
considering the methodology to be used and the extent of 
the adjustment. Before any conclusions regarding the impact 
of an adjustment on reapportionment and redistricting can 
be reached, the Bureau's decision must be considered. 

Despite its reliance on population statistics, the 
reapportionment/redistricting process is essentially legal in 
nature. Whether this is due to the environment in which the 
process takes place or the people who usually carry it out, it 
is treated as a legal question of political representation. The 

efforts of government reform groups to change the 
reapportionment/redistricting process are always through 
the courts, and their proposals involve changing who manages 
the process and how. No one has proposed that the process 
be treated as a completely technical one and be assigned to 
a computer. 

The Bureau is responding to the concerns it now hears 
expressed. After an adjustment is made, the issue will not be 
the need for an adjustment but the result and its impact. If 
the Bureau decides to adjust the census, it will bear the 
burden of justifying the action and the methodology. If the 
Bureau decides against an adjustment, it will be criticized, 
and possibly the Congress will enact legislation requiring an 

In the case of the mid-decade census, the Congress did 
act. The discussion of a mid-decade census predates the 
establishment of the Census Bureau [1]. The Bureau is 
required to take a decennial census and is permitted to 
produce current population on an annual basis. The Congress 
ended discussion in 1976 by requiring a mid-decade census. 
Accordingly, the Congress produced legislation that details 
the intended use of the results of such a census. 

While the requirement of a mid-decade census may mean 
more work for the Bureau, the Congress has shouldered 
some of the responsibility for its conduct and its results. The 
legislation is interpreted by the Bureau as "flexible," allow- 
ing the Bureau to determine methodology [1]. This 
flexibility also allows the Congress to determine the scope of 
the mid-decade census both by direct review and by provi- 
sion of funding. The scale of the Bureau's programs (the 
current proposal is for a limited sample survey) will be set 
with congressional review and approval. 

The Congress has also specified that: 

(IF) in the administration of any program established by 
or under Federal Law which provides benefits to State or 
local governments or to other recipients, eligibility for or 
the amount of such benefits would be determined by 
taking into account data obtained in the most recent 
decennial census, and (IF) comparable data are obtained 
in a mid-decade census conducted after each decennial 
census, then in the determination of such eligibility or 
amount of benefits the most recent data available from 
either the mid-decade or decennial census shall be 
used [6] . 


Information obtained in any mid-decade census shall not 
be used for apportionment of Representatives of Congress 
among the several States, nor shall such information be 
used in prescribing congressional districts [6]. 

The result of these congressional stipulations is that the 
Bureau may face some questioning of its 1985 population 


counts by those whose general revenue-sharing funding drops, 
but the court challenges regarding congressional district 
reapportionment will be directed at Congress. In addition, 
while this clause may protect Congress from a 1985 reapper- 
tionment (and since reapportionment is a fate worse than 
death, I'm sure that Congress feels it will), it does not pro- 
hibit the use of the mid-decade census in challenging State 
legislative district plans. Given the rate at which our Nation's 
population is changing, there will certainly be challenges to 
a State representation plan unless this prohibition is 
extended, but the lobbying and complaining will be directed 
at the Congress, not the Bureau. If this prohibition is ex- 
panded, or if State constitutions specify decennial redis- 
ricting, or if the Bureau does not produce data for small 
geographic areas, there should be no questioning of the 
Bureau's 1985 product. If States are subject to redistricting 
as a result of the 1985 count and, even worse, if the courts 
invalidated this prohibition, then the Bureau will find itself 
the focus of questions regarding the methodology and 
validity of the 1985 counts. 

If the Bureau decides it must adjust the 1980 count, or 
the Congress mandates an adjustment but provides no details, 
the mechanism and methodology of the adjustment will 
determine whether or not the adjustment will impact the 
reapportionment/redistricting process. If it is possible to 
estimate the undercoverage, adjust the count, and incorpo- 
rate the results into the census as it is officially released 
and used for apportionment of Congress and Federal aid, 
there is a good chance that the adjustment would be gener- 
ally accepted and upheld by the courts. Of course, the 
adjustment would have to apply to sub-State counts as well 
as the national and State totals. 

As the result of an adjustment creates a greater gap 
between the figures available and the concept of a single 
census (issued as the one set of true numbers), the amount 
of challenge will grow. In the allocation of funds many 
factors are involved, the numbers are large, comparisons 
are vague, and Federal funding is not easily characterized 
to the public as a zero-sum situation. In reapportionment 
those involved know that the number of seats are limited, 
small, and for someone to gain, someone must lose. If the 
difference between the adjusted census and the true census 
is reasonable (which means that it appears valid, has wide 
academic support, the methodology is not under constant 
attack, and its presentation is trouble-free, and this applies 
at all levels of geography), the adjusted count should be 
accepted by the public and considered valid and better by 
the courts than the unadjusted count. When the gap widens 
to the point where the court considers the adjusted count 
invalid or no better than the unadjusted count, the potential 
for problems disappears as the unadjusted count again be- 
comes controlling for the purposes of reapportionment and 

The greatest challenge to the use of an adjusted count for 
reapportionment purposes would result if the courts ac- 

cepted an adjusted count as valid, but the public considered 
the adjustment unreasonable. Such a situation seems unlikely 
given that the adjustment, as we are considering it, would not 
be congressionally mandated nor required for apportionment 
of Federal aid, making it unlikely that the court would 
accept an adjustment as valid unless it was generally accepted 
by the public. 

Given the consideration and preparation the Census 
Bureau has put into its decisionmaking process, it seems 
unlikely that an adjusted count would be accepted as valid 
by the courts for reapportionment and redistricting purposes 
and at the same time generate more problems and complaints 
than a census which is known to be an undercount. An 
adjustment will change who complains. Different States and 
municipalities will be affected by an adjustment than those 
who suffer from an undercount. The magnitude of the 
problem should be predictable if the Bureau has confidence 
in its 1980 estimates and the methodology it plans to use. 

On the congressional level, the number of seats at stake 
are fixed. While there may be a change in which States lose 
and which gain, I would speculate that the currently pre- 
dicted losers will simply lose a little less and gainers gain a 
little less. State legislatures will also be in a trade-off 
situation, with potential gains and losses depressed. 


1. Argana, Marie, and Levine, Daniel B. "Countdown to 
1985." American Demographics, 1, No. 10(Nov./Dec. 
1979), 42. 

2. Dixon, Robert, Jr., Democratic Representation, Reap- 
portionment in Law and Politics. Oxford University 
Press, 1978, 167. 

3. Joint Legislative Committee on Reapportionment. Re- 
port of the Joint Legislative Committee on Reapportion- 
ment. State of New York Legislative Document, No. 98, 
Nov. 17, 1953. 

4. Keefe, William, and Ogul, Morris. The American Legis- 
lative Process. Prentice-Hall, 1977, chapter 3. 

5. National Conference of State Legislatures, Committee 
on Ethnics, Elections, and Reapportionment, Reappor- 
tionment Subcommittee. Federal Case Law and State 
Legislative and Congressional Districting, A Brief Sum- 
mary of Federal Court Precedents Likely to Govern 
Post- 1980 Redistricting by the States. 1 979. 

6. Public Law 94-521 (H.R.1 1337); Oct. 17, 1976, section 
141, para, e,1. 

7. United States Law Week, 45, LW 3340 (1 1/9/76). 

8. "Toward a System of 'Fair and Effective Representa- 
tion,' A Common Cause Report on State and Congres- 
sional Reapportionment." Common Cause (November 

1977), 12-15. 

9. U.S. Department of Commerce, Bureau of the Census. 

"Coverage of the Population in the 1970 Census and 
Some Implications for Public Programs." Current Popu- 
lation Reports, Special Studies, Series P-23, No. 56. 


Washington, D.C.: U.S. Government Printing Office, 

10. U.S. Department of Commerce, Bureau of the Census. 
"Coverage of the Hispanic Population of the United 12. 
States in the 1970 Census." Current Population Reports, 
Special Studies, Series P-23, No. 82. Washington, D.C.: 

U.S. Government Printing Office. 

1 1 . U.S. Department of Commerce, Bureau of the Census. 
"Developmental Estimates of the Coverage of the Pop- 13, 
ulation of States in the 1970 Census: Demographic 

Analysis." Current Population Reports, Special Studies, 
Series P-23, No. 65. Washington, D.C.: U.S. Government 
Printing Office, 1977. 

U.S. Department of Commerce, Bureau of the Census. 
Evaluation and Research Program Estimates of Coverage 
of Population by Sex, Race, and Age: Demographic 
Analysis. Washington, D.C.: U.S. Government Printing 
Office, 1974. 

U.S. Department of Commerce, Bureau of the Census. 
Commerce News (August 1, 1979), CB79-137. 


Richard Smolka 

American University 

The points that Mr. Carlucci makes about the impact of 
adjustment on politicians are important. To carry the issue 
further, from the State legislative to the local level, there are 
over 500,000 elected public officials that are chosen by 
ballot in perhaps 100,000 districts. The impact of two sets of 
figures on this group would be great, as would the impact of 
timing (i.e., when, after the census, adjustments are made). If 
the census is indeed adjusted, can a more accurate count be 
ignored when reapportioning or redistricting? Can this issue 
be resolved by passing a law? 

Whatever is decided about adjustment will inevitably be 
challenged. In one of the cases cited, Shelby County vs. 
Dixon, it was held that redistricting below the congressional 
level is not to be confined to the use of census figures alone. 
Reformers, both inside and outside of government, will use 
this approach with emphasis on the rights of groups including 
minorities. There also have been a number of cases dealing 
with dilution of the vote. Under section 5 of the Voting 
Rights Act, for example, plans require Federal preclearance. 

and the State and local governments have the burden of 
proving that any change is nondiscriminatory. Adjustments 
to census counts certainly will affect the way the Justice 
Department will evaluate plans. With reference to those State 
and local governments not covered by the Voting Rights Act, 
questions are being raised in the courts as to whether at-large 
elections dilute the vote or are discriminatory against 
minorities or low to moderate income persons. In the 
Pasadena, Calif., test case, a possible solution might be to 
increase the number of districts, and this will in turn raise the 
pressures for data. 

So much depends on the population base— that is, when, 
where, and how the figures are published. The legal con- 
sequences of adjustment are unclear, but the political 
consequences are that 250,000 to 300,000 State and local 
officials will be very concerned. Whatever figures are used 
will be subject to further change at the next census. It is 
obvious that the political uncertainties of the adjustment 
issue will touch every local government unit. 





Adjustmentfor Census Underenumeration: 
The Australian Situation 

Brian Doyle 

Australian Bureau of Statistics 


Before explaining what has been done in Australia with 
regard to underenumeration in the census, it is necessary to 
give some background on Australia and its political system. 

Australia has six States and two Federal Territories. The 
Federal parliamentary system is modeled very closely along 
the lines of the Westminster system. The main difference is 
that the upper house (the Senate) is democratically elected, 
with each State having equal representation. Senators are 
elected for a 6-year term, with half the Senate facing 
reelection every 3 years. The Members of the House of 
Representatives (the lower and more important house) are 
elected for a 3-year term on a "one-man-one-vote" 
principle. In addition to the Federal parliamentary system, 
there is in each State (generally) a similar system of an upper 
and lower house of Parliament. 

There are only 11 cities of 100,000 people or more. These 
account for 70 percent of the total population of Australia 
but less than 1 percent of the land mass. Western Australia, 
while containing only 1.2 million people, is roughly 30 
percent of the land area of the United States (including 
Alaska and Hawaii). There are approximately 900 elected 
local government authorities, covering areas ranging in 
population from a few hundred to 700,000 people. 

The Federal Government collects all personal income tax 
revenues, which account for about half of the total Federal 
budget receipts. The remainder is made up mainly from 
company taxes and indirect taxes. About 35 percent of the 
Federal outlay is in the form of cash benefits to persons 
(mostly welfare payments— age pensions, unemployment 
benefits, etc.), with a similar amount being grants to State 
governments. (This is discussed in detail later.) 

These grants represent about 60 percent of State govern- 
ment income. Grants from State and Federal governments to 
local government authorities (LGA's) account for about 20 
percent of local government receipts and about 2.5 percent 
of State government outlays. 

This potted description should provide sufficient back- 
ground information to enable an understanding of the 
following paper and why adjustment for the undercount was 
undertaken in Australia. 


Population statistics are required for a variety of purposes, 
not the least of which are the legislative requirements. In this 
section, I will discuss the need for population statistics for 
electoral distributions and fund allocation, and the con- 
sequent legislative requirement for a census every 5 years. 

Requirement for a Quinquennial Census 

The "Census and Statistics Act of 1905" as amended in 
1977 requires that: 

"The Census shall be taken in the year 1981 and in every 
fifth year thereafter, and at such other times as are 

Prior to this being included in the act in 1977, the 
requirement was "every tenth year or at such other time as is 
prescribed." However, a census has been conducted quin- 
quennially since 1961. 

Table 1. State Size, Population, and Representation for Australia: September 1978 

Federal Members 



of House 



(thousand Km ) 









New South Wales 















South Australia 





Western Australia 










Northern Territory 

1 ,346.2 




Aust. Capital Territory 







Three important points arise out of this. 

1. There is a statutory requirement to conduct a census 
every 5 years; 

2. The Australian Bureau of Statistics (ABS) could be 
required to conduct a census more frequently than 
every 5 years; and 

3. The requirement is under the Census and Statistics 
Act, not written into the Constitution or into a more 
general act. 

Electoral Requirements for Population Estimates 

The main incident that led up to the amendment of the act 
in 1977 was a High Court decision in 1976 that certain 
apsects of the Electoral and Representation Acts were in- 
valid. These were concerned with the determination of the 
number of electoral seats in the Federal House of Repre- 
sentatives, and the decision indicated that there needed to be 
an electoral redistribution within the life of every Parliament, 
i.e., at least every 3 years. The determinations of State 
representation are based on the "latest statistics of the 
Commonwealth," which have as their basis the most recent 
population census data. In view of this, the Government 
considered it necessary to require the ABS to conduct a 
census at least every 5 years. 

In Australia, population estimates are used to determine 
the number of Federal electoral divisions (i.e., the number of 
Members of the House of Representatives) in each State. The 
individual electorate boundaries are, however, determined on 
the basis of the number of registered voters, such that no 
electorate may differ from the average— in terms of number 
of registered voters— by more than ± 10 percent. Population 
is not, therefore, an important, direct element in determining 
the geography of an individual electorate, although the 
number of registered voters is. 

The primary reason the census is required quinquennially 
is, however, to ensure that each State's representation 
reflects up-to-date population numbers. 

Fund Allocation Requirement for Population 

As mentioned earlier, a considerable portion (about 60 
percent) of State government revenue is in the form of 
disbursements from the Federal Government. A major 
portion (slightly less than half) of the grants are in the 
form of "Personal Income Tax Sharing Entitlements." 
The Federal Government has determined that a specific 
proportion of net (personal) income tax revenue be allocated 
to the States as general purpose grants. In 1976 and 1977, 
33.6 percent of net personal income tax revenues were 
allocated, and 39.8 percent for 1978 and 1979. 

In the allocation to the States, each State's share is 
determined with respect to an "adjusted population figure," 

where this "adjusted population" is defined in the "States 
Personal Income Tax Sharing Act of 1978" (PITS) as 


In the case of Victoria, the estimated population of 
that State on 31 December in that year; 
In the case of any other State, the estimated popu- 
lation of the State on 31 December in that year 
multiplied by 

• in New South Wales 

• in Queensland 

• in South Australia 

• in Western Australia 

• in Tasmania 


In simple terms, per head of population, Tasmania receives 
slightly more than twice as much as Victoria receives. 

There is nothing in the act to define what "population" 
means, except the act also contains: 

"Section 10. A determination made by the Commissioner 
under Section 6, or a determination made by the 
Australian Statistician under Section 9, shall, for the 
purposes of this Act, be conclusively presumed to be 

The population "multipliers" in the legislation area product 
of negotiation between the Commonwealth and State govern- 
ments and are subject to periodic review. Changes in relative 
allocations can be effected by amending legislation in the 
Commonwealth Parliament. The link between population 
estimates and relative allocations is, therefore, by no means 

States also receive money to be reallocated to local 
government authorities. The relevant act, "Local Government 
(Personal Income Tax Sharing) Act 1976," specifies the 
manner in which the money is to be allocated: 

A State shall 

(a) allocate not less than 30 per centum of the amount to 
which it is entitled under Section 5 in respect of a 
year amongst local governing bodies in the State on a 
population basis, that is to say, on a basis that takes 
into account the respective populations of those local 
governing bodies and may take into account the 
respective sizes, and the respective population 
densities of the areas of those local governing bodies 
and any other matters agreed upon between the Prime 
Minister and the Premier of the State as being relevant 
for the purposes of that allocation; and 

(b) allocate the remainder of the amount amongst local 
government bodies in the State on a general equaliza- 
tion basis, this is to say, on a basis that has the object 
of ensuring, so far as is practicable, that each of those 
local governing bodies is able to function, by reason- 


able effort at a standard not appreciably below the 
standards of the other local governing bodies in the 
State, being a basis that takes account of differences 
in the capacities of those local governing bodies to 
raise revenue and differences in the amounts required 
to be expended by those local governing bodies in the 
performance of their functions. 

It would appear that the legislation allows scope for an 
allocation of funds among local governing bodies that differs 
significantly from population relativities. 


Following the production of the first preliminary results 
from the 1976 census, it was realized from comparison with 
postcensal (1971) estimates that the census had missed 
significant numbers of the population. The postenumeration 
survey (PES) conducted after the census confirmed this. 
After considerable analysis of other demographic informa- 
tion, it was decided that a "better" population estimate 
would be obtained if an adjustment was made to the census 
results for underenumeration as measured in the PES. The 
estimate so derived was, given the deficiencies of the various 
collections, sufficiently close to the population derived from 
demographic analysis as to make the PES estimate 

It was realized that it would be unrealistic to use a small 
(two-thirds of one percent of households) survey to adjust all 
census results, especially those for small areas. 

Therefore it was decided that: 

(a) A clearer distinction would need to be made by the 
ABS between "census results" and "population 
estimates." Population estimates are based on census 
results but are adjusted for underenumeration. 
Population estimates are produced 

• Annually, showing the total population for each 

• Annually, showing the total population for each 
State by age and sex; and 

• Quarterly, showing the total population for each 

State by sex. 

As well, a "civilian population 15 years and over" is 
produced for each State by sex and age on a monthly 
basis for use in estimation in the monthly labor force 
survey. These are projected estimates and are super- 
seded by the quarterly estimates when actual data be- 
come available. 

(b) Census results as such would not be adjusted for 

(c) Only basic population estimates would be adjusted. 
Requests for underenumeration adjustment in addi- 

tional areas or for other characteristics would be met 
by giving indications of underenumeration but not 
"officially" providing estimates. That is, the ABS 
would be prepared to give qualitative results from the 
postenumeration survey rather than quantitative. 

The decision to produce adjusted population estimates, 
while having significant political effects, was not politically 
influenced— it was purely an ABS decision as to what was 
technically best. The general acceptance of this decision is 
largely the result of the range of adjustment factors being 
relatively small and the fact that most allocations are based 
on percentage of a fixed total rather than on a per capita 
fixed amount. 

Our population estimates are a potpourri of concepts, and 
changes over recent years can best be summarized by table 2. 

There is no conceptual justification for the current series 
nor for the "prior to 1976 census" series. They are neither 
"de facto" nor "de jure" estimates but a mixture of both. 
The ABS is planning to move to a fully resident based series 
following the 1981 census. The procedure adopted after the 
1976 census thus produced two important changes to the 
population estimates: 

(a) The adjustment for underenumeration, and 

(b) The inclusion in estimates of Australians temporarily 
overseas (and the corresponding exclusion of visitors 
to Australia). This is largely to avoid the seasonal 
effect of tourism on population. See appendix A for 
table showing the seasonal pattern and the net effect 
of short-term movements. 

The "error" in the intercensal adjustment is, as best we 
can estimate, relatively small. As the 1971 census was subject 
to a different level of underenumeration (about 2 percent), 
we see from table 3 that the Australia total (1) is about 
one-fourth way between columns 2 and 3. The same cannot 
be said of the individual States, as levels of under- 
enumeration may have varied considerably for a particular 
State between 1971 and 1976. As well, since the de facto 
1971 census figures were updated only for permanent 
(internal) moves, any variation in the stock of visitors 
between 1971 and 1976 in a particular State would lead to 
apparent "errors" in any comparison. 


Once the decision to adjust was made, the procedures 
adopted were relatively straightforward. I shall discuss the 
production of State estimates first, then the local govern- 
ment authority estimates. 

State Estimates 

The underenumeration rates for each State (as measured in 
the coverage survey of the PES— see appendix B for a brief 


Table 2. Intercensal Estimates of Population 


Intercensal adjustment 








Prior to 1976 census 

De facto, as 

De facto 

De jure 

De facto 

1976 to 1981 

De facto, 
adjusted for 

De facto 

De jure 

De jure 


Planned for post- 

De jure. 

De jure 

De jure 

De jure 

1981 census 

adjusted for 


'"Internal" refers to the population within Australia. 

2 "External" refers to population movements between Australia and the rest of the world. 
"External de facto" therefore means that Australians temporarily outside Australia would not be 
included in the Australian population, and overseas visitors to Australia would be included. 
"External de jure" means that such temporary movements would be excluded from calculations. 

Table 3. Population by State, June 1976 





1971 census 


1976 census 
as recorded 

1976 census 
for under- 

As percentage of column totals 






100.00 100.00 















































description of the survey) were used directly to produce the 
estimates of total underenumeration for each State. Sex and 
age estimates for the State were then produced by de- 
mographic analysis; the PES rates were not used for other 
than total population. 

For a historical series, the picture is very complex 
because of the effect of other conceptual changes (e.g., 
inclusion of aboriginals, exclusion of net short term visitors). 
However, the procedure was as follows. 

• Assume 1961 "as recorded" and 1976 "as adjusted" 
were "correct." 

• Assume 1971-1976 births, deaths, and migration were 
correct. This led to an implicit underenumeration in 
1971 of about 2 percent. The 1971 PES, which was 
known to have a number of serious deficiencies, had 
measured 1.35 percent. The 2 percent was, however, 

allocated according to the State underenumeration 
rates as measured in 1971, multiplied by 2/1.35, 
• The "error" between 1971 and 1961 was then spread 
over the 10 years. 

As said above, the procedure was complex, and for almost all 
external purposes a break in the series is shown before June 
1971, with a footnote: 

"The estimates for 1971 for each State and Territory are 
made from the 1971 census results, with augmented 
adjustments for underenumeration to make the total 
balance with the estimates for Australia made retro- 
spectively from 1976" (from 1979 Australian Year Book, 
p. 80). 


LGA estimates 

Corresponding LGA adjustments for underenumeration were 
made to bring them into line with State estimates. These 
adjustments are approximate, but I believe it was better than 
doing nothing. As has been seen from the earlier discussion, 
population estimates are not a critical component of electoral 
distribution or fund allocation for LGA's. 

From the PES, estimates of underenumeration rates for 
the individual strata of the PES survey (there are approxi- 
mately 500 strata in the survey) were obtained. Within each 
State, these strata were grouped into relatively homogeneous 
groups such that 

1. The standard error of the percentage underenu- 
meration for the total of each group of strata should 
not exceed 25 percent; and 

2. The lowest underenumeration rate must be no more 
than 30 percent less than the highest under- 
enumeration rate within a group. 

The underenumeration rate for the grouped strata was then 
allocated to all LGA's (or parts) within the grouped strata. 
As a general rule, the grouped strata were along the lines 

• inner city (high underenumeration) 

• inner suburbs (moderate underenumeration) 

• "suburbia" (low underenumeration) 

• nonmetropolitan urban (low underenumeration) 

• nonmetropolitan rural (moderate underenumeration) 

The historical series of LGA estimates were adjusted back to 
1971 in virtually the same way as the State and have been 
updated annually since 1976. No cross-classifications are 
adjusted at the LGA level. 


One question that has been raised in the documentation 
from the U.S. Census Bureau is "Has the credibility of the 
Bureau (and the census) suffered?" The answer is a cautious 
"no." I am cautious for two reasons: 

1. The ABS has not run a census since making the change, 
and I would expect that any criticism from the general 
public would not occur until census time. 

2. The ABS was severely criticised after the estimates 
were released, but this was not for what we had done, 
but rather for the fact that we did not explain to the 
users what the various estimates meant; we confused 
the users with the proliferation of estimates. Within a 
few months, users had to contend with: 

September 1976: 
Sept.- Dec. 1976: 
January 1977: 

February 1977: 
March 1977: 
June 1977: 

Post-1971 censal estimates for 
June 1976 

Preliminary 1976, census counts 
for LGA's for June 1976 
"Final" State totals from pre- 
liminary census processing for 
June 1976 

State estimates adjusted for 
underenumeration for June 1976 
Release of LGA unadjusted esti- 
mates from preliminary processing 
Release of LGA adjusted esti- 

I have no doubts that this confusion will be avoided 
following the 1981 census. I repeat that the adjustment itself 
has not come under criticism, and there has been no backlash 
on the ABS for making the adjustment. 

It would be foolish to say that the adjustment decision 
had not produced some problems for users, especially where 
they are using population estimates data and then need to 
disaggregate the analysis. In such situations, they are forced 
to use unadjusted census data and are no worse off than if 
the ABS had not adjusted. 


It is a fact of census taking that a census will always have 
errors, both in coverage and content. How important these 
errors are depends on their size and the uses to which the 
data are put. 

In Australia in 1976, the size of the undercoverage was 
such as to discredit the census by itself for one of its major 
legislative uses— the determination of electoral represen- 
tation. Adjustment was a necessary step, but it was an 
after-the-event decision. We had not anticipated the high 

It is important to stress that the detailed census results 
were not adjusted for underenumeration. Only the inter- 
censal population estimates series were adjusted to include 
the undercount estimate. The population estimates are very 
restricted in the amount of detail shown, but they are 
used in determining electoral representation and the alloca- 
tion of revenue to States. 

I see adjustment as a method of patching up holes. It is 
better, however, to prevent, if possible, the emergence of the 
holes in the first place. For 1981, we will be doing 
considerably more to improve coverage. However, under- 
enumeration will occur, and I feel that the ABS will continue 
to adjust population estimates for underenumeration. 

















1973 March 

1974 March 

1975 March 

1976 March 

1977 March 

1978 March 



-1 1 ,445 













-71 ,406 








- 4303 







-61 ,604 

- 2,875 








- 1,103 

1 5,687 





- 7,566 
















- 7,360 
















- 7 ,860 




The person coverage check (PCC) was conducted as part 
of the postenumeration survey. This survey was run 2 weeks 
after census night, and it was designed to produce estimates 
of net underenumeration of persons. The PCC was a 0.67- 
percent sample of private dwellings across Australia. 

Persons living in nonprivate dwellings (e.g., hotels, motels, 
hospitals) and sparsely settled areas were excluded from the 
postenumeration survey because of operational difficulties in 
conducting follow-up interviews. However, these amount 
only to about 5 percent of the population, and hence any 

underenumeration of them is unlikely to have a significant 
effect on the overall level of underenumeration. 

The postenumeration survey sought only a limited 
amount of information from sample households, i.e., sex, 
age, marital status, country of birth, and employment status. 
The results relating to total number of persons are subject to 
a standard error of approximately 0.04 percent at the 
Australian level, less than 0.1 percent at the State level, and 
less than 0.5 percent in the Australian Capital territory and 
the Northern territory. 

The randomly selected houses were approached by 450 
trained interviewers to determine the number of persons 
staying at each household on census night. The estimated net 
underenumeration was derived by comparing the people 
enumerated on census night as stated on the census schedule 
with the people living at that household on census night as 
enumerated by the interviewer. 

While every effort is made to minimize underenumeration 
in the census, some inevitably remain for various reasons. 

1 . Inadvertent omission of very young children, 

2. Persons missed because the dwelling was missed by the 
collector (out-of-date maps, ill-defined LGA 
boundaries, etc.), 

3. Treatment by the collector of an occupied dwelling as 
unoccupied, and 

4. Persons in occupied dwellings not wishing to be 
included on the census household schedule. 

Table B-1. Census Underenumeration as Shown by the 

Postenumeration Survey, by Age Group and 

Sex: 1976 



Females All persons 

All ages 







70 and over 










Results from the Person Coverage Check. Overall, the 
PCC revealed that 2.7 percent of the people were missed in 
the 1976 census. Some general conclusions can be drawn 
from the analysis of the PCC file: 


1. The underenumeration rate for males was higher than 
for females (3.0 versus 2.4 percent). 

2. The underenumeration rate varies between different 
age groups and ranges from 1.7 percent for the 10- to 
14-year age group to 4.7 percent for the 20- to 24-year 
old age group. In fact, the rate peaked for the 
20-year-old age group at 6.2 percent 

3. Divorced people (5.3 percent), followed by people who 

are permanently separated, had a higher underenumera- 
tion rate than married people (2.1 percent). 

4. Persons born outside Australia except the United 
Kingdom had a higher underenumeration rate (3.2 
percent) than persons born in Australia. 

5. Unemployed persons had an underenumeration rate of 
6.4 percent compared with 3.0 percent for employed 

Census Undercount: 

The International Experience 

Meyer Zitter and Edith K. McArthur 

Bureau of the Census 

A review of the practices of most of the developed coun- 
tries of the world relative to census undercount suggests that 
the United States does not have a monopoly in its concern 
for complete coverage of the population in conducting its 
censuses and in measuring and evaluating their accuracy. 
However, the similarity stops there. The broad-based issues 
being addressed at this conference, the impact on public 
policy, the adequacy of existing methods in measuring com- 
pleteness of coverage, the legality and equity, and the con- 
cerns of a wide variety of policymakers and data users as 
expressed in the materials being presented here, do not 
emerge as being very important for most of the countries. 

The minor role played by census undercount as an issue of 
public policy is also confirmed by the paucity of interna- 
tional discussion and debate. As far as the writers know, 
census undercount and its implications for public policy has 
not been a full-scale agenda item of the regular meetings of 
the Conference of European Statisticians, of the Economic 
Commission for Europe (ECE) countries, or of the expert 
group meetings convened periodically to address specific 
topics of concern, although there is evidence of substantial 
interest in measuring and evaluating completeness of census 
coverage. For example, a review of agenda topics of the 
regular annual meetings of the Conference of European 
Statisticians shows great concern for the quality and precision 
of census results. Indeed, there are recommendations that 
the precision of census results be included as a topic for 
study and discussion, that quantitative evaluations and mar- 
gins of error be published, and that information be compiled 
concerning the methodology used by various countries in 
the conference that carried out coverage checks by com- 
paring census results with population registers. But there has 
been no meeting dedicated to a discussion of the broad 
concerns being addressed here. 

This conference, thus, is the first of its kind and may 
establish the model for other countries sure to follow in the 
years to come. 


This paper reviews the experience of other countries on 
the general issue of census undercount. The report has two 
separate components: First, a review is made of how selected 
countries, i.e., those participating in the Conference of 
European Statisticians, handled the general problem of 
census coverage and the undercount issue in connection 
with the last census. Second, a review is made of data for 
developing countries in the data files of the International 


Demographic Data Center (IDDC) at the Census Bureau to 
provide additional perspective on methods of measurement 
of coverage and on the level of census completeness in de- 
veloping countries. 

This paper is not intended as an in-depth study of inter- 
national practices. Rather, it is designed to provide tone and 
flavor as to the general level of concern of other developed 
and developing countries on the undercount issue. It is 
narrow and limited in scope for a number of reasons. In the 
first part of the study, inquiries were made only of countries 
participating in the Conference of European Statisticians 
operating under the auspices of the ECE. The study was 
limited to these countries because it was believed that the 
state of development of the statistical systems and the role 
statistics play in shaping public policy more closely parallel 
the state of the art and practices in the United States. All 
these countries were "developed" countries as classified by 
the United Nations. Furthermore, the review relative to the 
practices of the countries in the ECE relies primarily on 
information received in response to our inquiry raising the 
several questions detailed below. There were no attempts or 
opportunities for additional detailed correspondence or 
discussion, or for elaboration and clarification of a number 
of points relating to census coverage. There is also the ques- 
tion as to whether "all the right questions were asked" if one 
were trying to do a thorough analysis of the practices and 
concerns of countries relevant to census undercount. For 
example, we did not inquire as to the uses of census data and 
postcensal estimates which are important variables relative to 
the importance of the issue. (In the United States, for ex- 
ample, the major concern with census undercount did not 
emerge until the 1970's, as more and more money, Federal 
legislation, and political representation were being impacted 
by census results.) Nor could we benefit from personal 
discussion and input as in the case of Australia. However, 
with these limitations, the review is designed to tell us much 
about where many of the countries stand relative to the 
United States in the role that coverage issues play in census 

Letters were sent to 25 countries (see appendix A) of the 
ECE region raising the following questions: 

1 . Do you routinely measure the completeness of 

2. What kind of methodology is used? 

3. How extensive is the geographic detail for which 
coverage estimates are prepared? 


4. Are the census data adjusted and if so, what specific 
uses are made of the adjusted series? 

5. Are census data (unadjusted) used for some purposes, 
whereas adjusted figures are used for others? 

6. Is the question of census undercount an important 
issue in your country? 

Replies were received from 24 countries and are sum- 
marized below in table 1. Narrative summaries for each 
country are given in appendix B. 

In addition to this review of the practices of the ECE 
countries, a separate review was made of the data available for 
developed countries in the demographic data files of the IDDC 
of the Census Bureau. The Center, as part of its responsibilities 
in compiling demographic data, routinely puts together what 
is known about the accuracy and reliability of census and 
survey data for the developing countries around the world. 
Material for some 25 countries (of 1 million or more popu- 
lation) in which postenumeration surveys or other evalua- 
tions of census coverage were undertaken are also summarized 
here to provide the conference with some general perspective 
on the levels of completeness of census coverage around the 
world. There are two kinds of figures shown— the "official" 
estimates of undercount, usually based on postenumeration 
surveys (PES) or other types of studies, and the "adjusted" 
levels of undercount, based on the IDDC review and evalua- 
tion. These data are presented in table 2. Source notes are 
given in appendix C. 




In response to the first item, "do they measure coverage 
completeness," 18 of the 24 countries that replied said "yes, 
some comparison study was done," 4 replied they did not 
measure coverage completeness. One country (France) said 
"yes," but referred mainly to a study conducted in 1962, 
and one country (Sweden) said "no comprehensive study," 
but coverage of housing units and economically active 
persons is measured. Furthermore, two of the countries that 
said "no" (Denmark and Norway) were depending primarily 
on population registers rather than a conventional census for 
census-type data. Thus, the majority of these countries do 
conduct some study of completeness of census coverage. 

As to "method of measurement" of census coverage, there 
are two main measures of coverage accuracy that emerge 
from the responses: A PES type of survey and comparisons 
with existing population registers. Of those countries that 
conducted coverage studies, over half have population 
registers, some of which are very extensive and compre- 
hensive. In Norway, the Netherlands, and Finland, the census 
is drawn from the register, and Denmark in 1980 will move 
to a "census" completed by compiling data using identifica- 
tion numbers from the several population registers rather 

than the traditional type census. Incidentally, in reviewing 

the materials on methods of measuring completeness of 

coverage, it appears that in some countries, measures of 

coverage are derived as byproducts of procedural and quality 

control operations carried out as part of the basic census 

instead of through a separate study. 

On the issue of "completeness of coverage," the material 

suggests a very low rate of undercoverage for the developed 
countries. The amount of indicated undercount runs from 
"negligible," indicated by Czechoslovakia, to a comment of 
"they have complete enumeration"for the U.S.S.R. However, 
this means in the U.S.S.R. that all errors discovered in the 
several postcensal operations, including their postenumera- 
tion survey, are corrected on the census records. In countries 
making comparisons with registers, it is not always clear 
whether the undercount measure is a net rate. Several of 
the countries appeared to consider that the census was more 
accurate than their register system. In the case of Italy, the 
register was 1.7 percent higher than the census count, but 
the inference is that it was less reliable than the census; the 
register failed to keep track of a large emigration rate, 
especially from southern Italy. Finland's reference to a 2.7 
percent undercount of population meant that 2.7 percent 
of persons on the register did not respond to the census. 
For these persons, vital information was tabulated from the 
register; only labor force characteristics were not available 
for omitted persons. In Canada, the coverage methodology 
used was a reverse record check accomplished by tracing 
a sample of persons drawn from various records (between 
1970 and 1975) to ascertain whether they were recorded in 
the 1976 census. Only in Bulgaria and in Spain were there 
indications of an overcount— 0.06 percent and 2.3 percent, 
respectively. In Bulgaria this was due to an excess of double- 
counted persons over omitted persons. In Spain, the over- 
count was of the de jure population. The figures are cited to 
provide some idea as to coverage levels as measured and 
reported by these countries. No attempt was made to 
evaluate the reliability of the methods described or of the 
results obtained. 

With regard to the undercount differential, most of the 
countries reporting an undercount also indicate some dif- 
ferentials in coverage for different groups of the population 
or for different geographic regions. Austria, for example, 
with an overall reported undercount of 0.4 percent, indicated 
that foreign workers were undercounted by as much as 20 
percent based on comparison with work permits that had 
been issued. Also, certain types of persons, for example, 
young persons, single persons, and males, were more likely to 
be counted wrong in Finland, Bulgaria, and Canada. In 
Canada, also, the undercoverage of persons with a non- 
English mother tongue was higher than for the general 


Most of the countries provided measures of undercoverage 

at the national level only. Finland indicated figures available 

to the communal level, some 475 areas, and Sweden provided 

measures for regional/metropolitan areas. Thus, in general, 



" c 


CD c 

O -t- 

p. i? 












































— * 

























































"D co 

< E 






■— — 

-C o 

a £ -c 

CD 0) CD 

a* fj c? S 5 
™ — .S£ i- c co 

o) co -C 

1- C CO 

CD £ — 

> - i= CO 

O "i > 

o &> CO 


3 '*= 

O C 


!r i_ 


D "a 


i- CD 

CD >- 

«+- CO 


c c 

3 3 

O O 

a a 


XI > 

C O 

































*r to 




o fc 

C co 


■a — 

C 4-" 

o c 

°- 3 

to o 

CD (J 



co jj- c 

■s c ° •£ 

2 CD V5 CO CO 

.52 c co +; 

*•> c o C 




h: O 


O O | 1 ^ 

a. '*j to b o 





I » 

o 2 
Q co 


4- .2 

o ■- 















4- .2 

o i- 










° <* 

B o c 

»; cs o 

? .5> to _ "^ 

i: CD Jr CO CO 

3 i_ CD +j — 

vP O -* O 3 

6-- h- i- 4- Q. 

O s- g m- O 

cn o s o a 



























to .£- 

3 i_ 
co CD 

£ g 

co O 

p SS .<o 








O O 


in "o> 


O) co Jr J: 

|-S g a, 8 

=: g* I s -g 

» m « V 2 

5 J (I) ± O 



> s 
o °. 



















































































- .2 

O "- 








*- Np 
CD O^ 

•O ^ 
c O 










5 ? "2 1 

-f£ CD C Q. 

- a ra 2 <o 


• 8 S 
8-k ' 

> 5 <» 
g a a 
^ co E 
cocn g 


t to 

tD C 

T3 CD 

•9 -c fc 
to <S O 


X oi 
p to 


O) CD 

i.E g 

a> o cd "J 

o 5 j- .t: 

co O c 

_£ih t 

C •- TJ 0J 

j= g c a 

+-< CD CO u 

03 S tt O 

DJ — S 



g a 

CO 3 

S E 




m +T k- 


>- c 

3 CD 

CO Q_ 



"s » i 

o E5 

O CD *■ 

tD CO — i. 

I- u_ CD -, 







o 2 E 

-^ CO L. 








J5 1 















E c 

« .2 





o g 











O .J. co 

f"* "D "S 5 

en to SS .2 

*— -C jj +j 

■w to 

to CD 

4- l - 



E = 

> E 

o 2 


O 4-> 3 

> « S o 
« -° i3 a 

.t j M ■- 


■ £-£ 

o : co o 

E to E £ 


























0) c <2 

3 c o 
> <o a, 

























































5 E 


















































a> O 

J2 C 


I « 

c C CD 
C O — X) 

Q > £ g? 

O) O J- "O o — 

•— u a; c r- c 
X S a co t c 

a) c 

s » 

3 — < 

+»■ CD 

en Ol l- 

S g 3 

"- n ■- 














c c 

+-» C 3 

4- U> O 4- 4- 

o -g, *; 1> o -o 

^p a, T3 D ^ CD 

1 . C m 0) ' C 

CM O*: UN t 






<= CM 
3 ta 

--" .E 























S 05 




— (O 




3 3 


« ? m 

C '£ 'o _ 

CD O O 0) 

Q. > (A CO 



« CO Ol O- _ 

*± « ■- to t 

co *; V" 

c .t > 4- 

J3 i: o 



















































































for persons 
at place of 
birth— they 






















study is 






f - 





CJ w 

a 3 
CO to 

C 0) 

O to 
v> — 


° '5 

o S 





CD « 
O) " 


CD -C .^ 

£ .t: ~ 

co g 0) 

1 "D & 

<5 c E 

co co 

■^ c - 

= 8 

tt a S 


iS, oi _ 

CO 1- CO 



CO «U 
CD i=s 

« S 

^ C3) +-» 

« CO C 

C -71 CD 

P <D i- CD 

+j CO (j CO 

CD O > co 

il O S CD 



-w CO 
c ^- 

cD ?J 
"g I 
a tu iu 

c 2 = 

— Q. co 



°- c 




t— CD to 



c co 

O 3 


E -c .2 
o .t: oi 

"8 c 

« J2 

3 I 55 
S o +j 

cl a. 2 
o § £ 




£2 fa, 
'5i 2 P s 

<D 3 
"- en 

o 5 

O o^ 3 O 
cj t— to £ 














■o -^ 

> 3 
tn Q. 

> a 







3 > 

a c 



_ i— 

CO 0) 


■° ^ 
























































































♦3 -Q 

.2. "D 
"O co 

< E 




i- c 

O <0 O ^ 

> o g a c 

° ™ o g> « ro 

"O 3 .92. 2 > o 

" o t: o c 3= 

— > Q. D. "O CD O 



i- c 
O c 

«- CD 


•D O O 
CD ■— +- 

en 9> o 

S I 

— , CD 








CO — CJ 

i5> '<o !c 9> 



CD ^ — 

03 CD JD 

- E -? 

> ~ CO 

O <n > 

O <U CO 









1 1 






O c 



=> 3 

O o 

S £ 


"O > 

C O 


!2 £ 

Q. CD 
O <- 


C 3 
CD en 


T3 C 

o <" 














C co 


C ~ 

o c 

a => 

CO o 

CD (j 






9 ° "• « 

> .c -o c 

° 4- C O 

O q CD en 

if a 


■D cN TJ 





C J 












































































03 4- 

ro O 


iu E 





















































o -i: 

.E E 

> CO 

O A 



0- D 

^ c H 

CO o CO 

^. ■- crt O 

ro "a a n 

■JP C -O T3 

C <" C CD 

Q> U) CO >. 
1- C 
P O 

CD CD _ 
CD - " > ^ 

g,£ » ™ « 

co „ S -C 

CD o 

P v D. 

S t 


co P 

c x 

CO -D 

c to 2: cd 


cd en c l 

S. « CO 1) " 
^ 1- 00 +^ X 

o "g £ 

t CD £ 

O 4- O 

" o E 






oo E c 

U jQ • c C S -O 3 

^i-O CDC00CD 


4- .— 2 Q. 

n 0) u t 

a c 3 cd 


6 c 

v o 

3 ro 5 

R 3 4- O 

■S 2 s? 8 

' °. £; 3 

C 4- °? O 

D o O s: 



cj v> 

E o 

o S2 


di oi o a 



3 4- • — 

S o c 

CD w T 



— CD 

> CD S 

"E > 3 "o . 

n ° 2 c " 

u or co ro 


a « 
E g 

CJ w 








































c c 

CD c CD 
"D O "D 

a 2 a 

CD CD 0) 

§ si 

^. i_ CO 




o S: 




CD ro 
D) C 

CO — CD 

cd >• a 

> ^ c 

o ^ S 


en en 































5 -s 

(0 CD 

CJ ~ 

~ CD 

o- > 

a ra 

Z s 

o z 


Table 2. Census Coverage Measurements in Selected Developing Countries with Populations of 1 Million or More 



of latest 










estimate of 

undercoverage 1 

Notes— use of 


estimates 2 






El Salvador 

Hong Kong 







































Time series 
and PES 

















Sierra Leone 








.66 (time series) 
.42 (PES) 




No IDDC evaluation 

No IDDC evaluation 

No IDDC evaluation 

Estimate accepted 
after evaluation 



Estimate accepted 
after evaluation 

IDDC accepts estimate 
from time series 

Official adjustment 
of census count 



Official adjustment 
of census count 

Official adjusted 



Official adjusted 












Estimate accepted 

Official adjusted 

after evaluation 




Adjusted figures 
are used as official 





No IDDC evaluation 

Adjusted figures 
are used as official 


Estimate accepted 
after evaluation 










No IDDC evaulation 

Official adjusted 

(See footnotes at end of table.) 


Table 2. Census Coverage Measurements in Selected Developing Countries with Populations of 1 Million or More — Continued 






Notes— use of 


of latest 




estimate of 







estimates 2 

South Africa 



Method of 
time series 
is unknown 


No IDDC evaluation 

Official adjusted 

Sri Lanka 


1 2,690 










No IDDC evaluation 

Official adjusted 








Yemen (Sana'a) 





No IDDC evaluation 


'The International Demographic Data Center analyzed census and coverage methodologies to determine their accuracy. In those countries not 
having "IDDC" estimate, only the country's own estimate of undercoverage is shown. 

2 For further information on sources of data from individual countries, see appendix C on sources and comments on data from selected 
developing countries. 

only national data are available with very little geographic 
detail emerging. 

The relative importance of the issue also varied greatly 
between responding countries. Census undercount was not 
an important issue in most of the countries. However, ac- 
cording to some of the responses, it had not been important 
because, until the recent census, undercount was believed to 
be minimal. In Finland, the lack of labor force characteristics 
for the 2.7 percent of the population missed was felt to be 
important, but they did not indicate the impact of the 
missing data on statistics or policy. West Germany offered 
the comment that the problem of an undercount is becoming 
increasingly important. In Canada, although no adjustments 
to census data were made in 1976, the possibility of adjust- 
ing results for undercoverage in 1981 is being seriously con- 
sidered. We have already heard about the case in Australia 
where the issue was important enough to warrant the atten- 
tion of the legislature. In the Netherlands, the issue of 
coverage, in terms of whom to enumerate, is currently 
receiving attention from the legislature. Through the last 
census in 1971, the census was used to check the accuracy 
of the population registers; however, in 1981, the census 
may be limited to persons already listed in the population 
registers. This restriction is being considered in order to avoid 
the appearance of tracking down illegal aliens in the Nether- 
lands. For the remaining countries, the issue was either not 
serious or nonexistent; thus, a conference on the undercount 
would not have attracted much attention in these countries. 

The final question was "Are census data adjusted?" Only 
Finland and Sweden (and Australia, of course) indicated 
"yes." All other countries indicated "no." In the case of 
Finland, they leave it up to the user to decide whether to 
use adjusted or unadjusted series. In the case of Sweden, 

adjusted figures are for regional planning purposes only; 
they are not used for "official purposes." 

Thus, the picture that emerges based on the review of the 
countries here is that the issues being addressed by this con- 
ference are not of general concern. Of course, the type of 
society, census conditions, the existence of population regis- 
ters, and the uses of census data, provide an entirely different 
environment for census taking and the purposes of census 
data. So, perhaps it is not unexpected that the experience as 
indicated is so unlike that of the United States. On the other 
hand, it is the writers' personal observations that some 
increase in interest and concern on this issue is occurring. 
Whereas the United States routinely measures the under- 
count as part of the census operation, in the past, the very 
concept of a census undercount in these countries would 
have appeared to be alien to their thinking— it was not 
considered by the professional statisticians apart from 
concerns of quality of census content. The view seems to 
be changing and more of the countries of western and 
northern Europe will see this emerging as an important issue 
in later censuses. In fact, one indication of increasing interest 
is that the general question of whether census data are 
adjusted for use in postcensal estimates may be included 
as an agenda item of the Conference of European Statis- 
ticians. Perhaps the next conference on the undercount, 
modeled along the present one, could be convened sometime 
before the 1990 census and be truly international in scope 
and take place under the auspices of the United Nations. 
With the 1980 census almost upon us, it is not too early for 
some appropriate international party, such as the United 
Nations or one of its regional bodies, to serve as the catalyst 
for encouraging international discussion of the many issues 
considered at this conference relative to census undercount. 





The second part of the study of the international experi- 
ence with census coverage measurement involved a review of 
data on developing countries collected by the International 
Demographic Data Center at the Census Bureau. All of the 
26 countries with populations of 1 million or more included 
in this review conducted some sort of coverage evaluation 
after their censuses were completed. Twenty of these devel- 
oping countries derived their estimates through a post- 
enumeration survey. Those that didn't have a survey used 
some form of demographic analysis to derive their estimates 
of census coverage. 

Among the developing countries with enumerated popu- 
lations of 1 million or more that were studied by IDDC, the 
estimated rates of undercoverage varied significantly from 
0.3 percent in Sri Lanka to 1 1 percent in Liberia. 

It is evident from table 2 and from our own experience 
that measuring census undercount even at the national level 
(in countries with deficient or weak statistical reporting 
systems) is no easy matter. The measures derived from a 
conventional PES are not always acceptable at face value and 
may require adjustment for more accuracy in reflecting 
actual levels of underenumeration. Again, the data we present 
here are intended to provide some additional perspective on 
the overall issue of census undercount. 









Received reply 





United Kingdom 













Received reply 























Federal Republic of 




































21 ,664 







Source: U.S. Department of Commerce, Bureau "of the 
Census. World Population: 1977— Recent Demographic 
Estimates for the Countries and Regions of the World. 
Washington, D.C. 1978. 




Latest census: 1976 

Census counts were adjusted to account for under- 
coverage— adjusted current population figures were used for 
apportionment of electoral seats and funds to States— 
because their PES showed substantial variation in amount 
of undercount between States. (They feel that their adjust- 
ment procedure could have been better if they had known 
prior to the census that an adjustment would have to be 
made so that their PES could have been altered to provide 
better estimates of undercount.) 

Although census counts were adjusted in the current popu- 
lation estimates to reflect the estimated undercount to 
provide official population figures, the census results them- 
selves were not adjusted resulting in two sets of numbers. 
(One of the major problems was that users were not ade- 
quately informed about the adjustments.) 

Historical series were adjusted: 1961 figures left unad- 
justed, 1971 figures adjusted by 1.35 percent, 1966 by 0.5 
percent, and the 1976 figures by 2.7 percent (leaving some 
implicit underenumeration in 1966 and 1971). 

Census figures were adjusted for official population counts 
down to the local level depending upon metropolitan density 
characteristics. They feel confident about State adjustments 
but not of those below State level; however, no adjustment 
greater than 4 percent was made, and none were negative. 

The PES sample size was 2/3 percent of households in 
order to provide reliable estimates on characteristics of 
missed persons. 
Respondent: Brian Doyle, Director, Evaluation and User 

Services Section, Australian Bureau of Statistics 


Latest census: 1971 (Accuracy has been studied since 1961.) 

Coverage of the population is not a serious issue or 


1. In 1971, a sample of their 1.5 percent microcensus was 
matched case by case with the census forms. The 
census count was higher than the microcensus count, 
but they are not sure which was more correct. 

2. On the aggregate level, comparisons with voting lists 
also revealed a higher census figure for persons 19 years 
old and over (approximately 0.5 percent) than did the 
voting lists. But the voting lists were not necessarily 

3. From a comparison of the number of foreigners' work 
permits at the time of the census, it is felt that foreign 
workers are undercounted by as much as 20 percent 
(0.4 percent of the total population). 

Accuracy measures are at the national level only and they 
do not adjust results of the census because they are not sure 
of the reliability of the accuracy measure— the measures are 
really only good for pointing up weak areas. 

A more detailed study of accuracy measures from the 
1971 census is planned. The methods used in coverage 
studies will be improved in 1981; but because of a tight 
budget, a postenumeration survey will not be done. 
Respondent: Dr. Lothar Bosse, President, Austrian Statistical 



Latest census: 1970 

The census is used to update population registers that are 
kept at commune level. Registers have records of residence 
of Belgian nationals and of foreigners who are resident in 

Registers are updated by vital records and reports of 
arrivals and departures from the country, by usual place of 

The registers are "renewed" by the census results. Census 
forms are filled in by heads of households as required by law. 
The census results are checked against the registers and 
entries are made in the registers after verification of where 
the error or the omission occurred. 

The official census figures have been found to be less 
than 0.5 percent lower than the figure for the same year 
from the registers. 

No adjustment is made to the series for underestimation 
of the population in the census. They are not concerned 
with that small amount of undercoverage. 
Respondent: A. Dillaerts, Director General, Ministry of 

Economic Affairs, National Statistical Institute 


Latest census: December 1975 (population register) 

Two postcensal sample surveys carried out: Population 
coverage completeness and accuracy of registration. 

Sample size in the population coverage survey was 3 
percent (or 26,000 persons); about 0.31 percent of population 

were missed; about 0.37 percent were double enumerated, 
thus the net error was a 0.6 percent overcount; 0.39 percent 
were incorrectly enumerated by place of residence. 

• Men more likely to be counted wrong than women. 

• Highest frequency of errors for population less than 5 
years old and 15 to 25 years old. 

• Urban population more frequently miscounted than 
rural, also large towns compared to small towns. 

The net error of 0.6 percent was considered too small to 
be concerned about adjustment of the data. 
Respondent: Committee of the Unified System for Social 



Latest census: 1976 (censuses conducted quinquennially) 

Methodology for measuring undercount: reverse record 
check (RRC). First used on a small scale in 1961 ; since 1966, 
conducted on full scale. The RRC in 1976 was designed to 
provide measures fo undercoverage of population down to 
the province level and for certain population subgroups. The 
Yukon and Northwest Territories were not included in the 
RRC because of the greater difficulties and higher cost that 
would have been involved. 

The sample was constructed using four frames: 

1. Persons enumerated in 1971 census 

2. Persons born between June 1, 1971 and May 31, 1976 
(from vital statistics records) 

3. Immigrants to Canada between June 1, 1971 and 
May 31, 1976 (from immigration registrations) 

4. Persons not enumerated in 1971 census (from a ran- 
dom sample of the 1971 reverse record check). 

The total sample was 33,111 persons. The RRC used 
higher sampling rates in smaller provinces and for-subgroups 
expected to have higher undercoverage. Each of the 33,111 
selected persons (SP) was searched in 1976 census returns. 
The search started with addresses in 1971, then went on to 
telephone directories, social and welfare agencies, and tax 
returns. At the end of the check, each SP was classified as 
enumerated in 1976 (88.2 percent), missed in 1976 (2.5 
percent), died (3.2 percent) or emigrated (1.3 percent) 
before 1976, or tracing failed (4.8 percent). 

Results: After adjustments to cover reweighting for 
nonresponse to the sample, undercoverage was estimated 
at about 2.04 percent of the total population (about 
476,500 persons). Significant differences were found in the 
amount of undercoverage by province. Quebec had the 
highest rate of undercoverage (2.95 percent). Certain sub- 
groups of the population also had differentially higher rates 
of undercoverage: Young persons, males, single persons, 
persons whose mother tongue was other than English 
(French or other). Overall, rates of undercoverage seemed to 
decline from 2.6 percent in 1 966 to 2.0 percent in 1 976. 

Census counts were not adjusted for undercoverage 


although estimates of undercoverage were prepared to the 
provincial level and for other demographic variables. But the 
issue of preparing adjusted figures is being seriously 
considered for 1981 . 

The issue of undercoverage is important because of 
federal/provincial funds transfers which are linked to census 
Respondent: Ivan P. Fellegi, Assistant Chief Statistician, 

Statistics Canada 


Latest census: 1970 

Completeness of coverage measured through comparison 
with population movement statistics (population register 
system is in place). 

An adjustment on the basis of an estimated undercount is 
not made, but corrections are made for specific detected 

Difference between census data and population movement 
statistics "negligible." 
Respondent: Jan Kazimour, President, Federal Statistical 



Lastest traditional population census: 1970 (mail-out type 
census; population registers in use) 

Since 1968, a number of registers— population, dwellings, 
tax information, school enrollment— have been established 
using common identification numbers of persons and other 
units. In 1976, for the first time, a "register based" census 
was taken combining information from each national register. 
In 1981, the entire census will be taken from the registers, 
using the identification numbers rather than forms mailed 
out to and filled by respondents specifically for the census. 
Respondent: Lene Skotte, Denmarks Statistik 

Federal Republic of Germany 

Latest census: 1970 (also studied 1961 results for coverage) 

Studies done of coverage and quality of the census- 
estimated undercount of about 0.9 percent of resident 
population. Counts are not adjusted. Their studies, called 
process controls (or descriptive checks), are used to detect 
causes of errors and to interpret census results. Three 
separate studies were made of census results starting from 
samples of individual documents. 

1. Immediate checks: (1970) A followup survey of 0.1 
percent to 0.2 percent of population in selected con- 
trol districts carried out 4 to 6 weeks after census day 
using a separate questionnaire. Its purpose is to check 
completeness and precision of coverage of buildings, 
households, and persons; it is intended particularly to 
trace omissions and double counts. 

2. Birthday selection: (1970) All persons covered by 
population census with birthdays on 31st of March, 
May, and July. Coverage of residence of persons with 
more than one residence to eliminate double counting. 
(In 1961, an alphabetic check was used by selecting a 
sample based on the first letter of surnames.) 

3. Checks of characteristics: Comparison with subsamples 
from the microcensus to ascertain quality of the data. 

Data on national level only. In 1981, the issue of coverage 
completeness will be of greater concern due to problems 
resulting from poor respondent attitudes, high proportion 
of foreigners, and increasing mobility of population (and 
multiple residences). 
Respondent: Dr. Hamer, Vice-President, Federal Statistical 



Latest full-scale census: 1970 (In 1975, a small-scale census 
or microcensus was carried out.) 

In previous censuses (1950, 196U, and 1970), undercount, 
was not studied. In 1975, coverage of persons and dwellings 
investigated. Undercount studied based on population 
register— everyone registers every year on January 1 ; each 
person has ID number and characteristics such as educational 
attainment recorded. The census forms were sent out to 
everyone on the register. 

About 2.7 percent of the registered population did not 
return their census forms. Thus, for this proportion of the 
population, no characteristics are available beyond what was 
in the population register; that means for those persons, no 
information is available on occupation and industry. 

About 2.0 percent of the housing units were omitted 
compared to records of construction statistics and the build- 
ing registers which are maintained in some communes. This 
was considered an "undercount," as no centralized address 
register exists. 

The persons not counted in the census were young (less 
than 5 years old), 20 to 29 years, more men, more divorced 
and widowed. 

Geographic detail was down to the level of communes 
(475 in Finland). 

Adjusted data were prepared at commune level for the 
population, for housing units, and for labor-force data by 
major industry grouping. 

The user decides whether to use adjusted or unadjusted 
data. In community planning, adjusted data used. 

Coverage in the census is important because prior to the 
1975 census, nonresponse was assumed to have been negligi- 
ble. In 1980, coverage improvement work will be intensified, 
better information about the census will be provided, and a 
special study of reliability based on interviews will be carried 
Respondent: Olavi E. Miitamo, Director, Central Statistical 




Recent censuses: 1962, 1968, 1972 

Coverage study was done for 1962 census, nothing more 
recent. They feel that in the succeeding censuses, the level of 
coverage declined by comparison with "current population 
evaluations." However, this may not be true because their 
migration statistics are of doubtful quality. 

Measurement of coverage in 1962 revealed a net under- 
coverage of 1 .7 percent. Coverage was measured by use of: 

1. Comparison with existing registers, i.e., voting, social 
security, birthdate. 

2. Field checks to catch double counts. 

3. A postcensal survey using area samples. The total 
sample size was 20,000 housing units— 400 area sample 
units of about 50 housing units each. The sampling 
units were intentionally small so each would be rela- 
tively homogeneous. The area sample units were 
defined on maps by physical landmarks. Sample size 
was 1 in 1,000 for most of the country, 1 in 500 in 
urban areas in excess of 80,000 population, and 1 in 
250 in areas of the greatest population concentration. 
The area sampling units proved to be small, making 
the survey a delicate and unwieldy task. 

Coverage of the census is considered a statistical problem 
rather than a political/administrative one. Census results are 
used administratively but the local governments themselves 
are responsible for the census in their own communities and 
do the recruiting for enumerators in their area. 

No systematic adjustment was made to the census results 
in 1962 for coverage. A "population evaluation" sample is 
drawn from the crude data from most recent censuses. The 
few corrections that were made were not related to the 
coverage study. 
Respondent: Edmond Malinvaud, Director General, National 

Institute of Statistics and Economic Studies 


Latest census: 1971 

The census was taken in 1 day by enumerators, to ascer- 
tain a de facto population. Immediately after, a sample 
survey was taken to measure coverage error. Estimated less 
than 1 percent of population was missed— only available at 
national level. Census data were not adjusted. 

The sample was split: One part was used to measure the 
proportion of housing units and entire households missed; 
the second part was within households, how many persons 
were missed. 

The multistage area sample was selected from 124 strata 
split into three levels: 

1 . Cities of more than 40,000 persons 

2. Cities of 5,000 to 40,000 persons 

3. The rest of the country 

Enumerators were generally civil servants such as teachers. 

Census undercount is not considered to be an important 
issue as it was so small; however, postenumeration surveys 
will continue to be carried out. (in 1961, according to an 
enclosed report, the greatest undercoverage occurred in 
Greater Athens but was balanced by large duplications in 
the rest of Greece.) 
Respondent: Chr. Kelperis, Director General, National 

Statistical Service 


Latest census: 1970 

A postenumeration survey was conducted to measure 
completeness and quality of the content of the census. 

It was not used to correct errors in the census but rather 
to detect problems and plan for future censuses. The sample 
size was 0.25 percent of the population. The survey was 
conducted like a second census, drawing from a regional 
quartering of census districts. It took place directly after the 
census. The sample included 7,900 households and 27,000 
persons. Prior to the time of the actual survey, sampled 
households were contacted to ensure cooperation. 

Although the PES was intended to measure quality of 
data collected rather than coverage, 0.4 percent of the 
population was estimated to have been missed. 
Respondent: Dr. Vera Nyitrai, President, Central Statistical 



Latest census: 1971 

Comparisons were made of census counts with communal 
population registers. In 1971, the registers were about 
900,000 higher (1.7 percent) than census counts. This was 
blamed on failure to keep the registers up to date with 
emigration. Southern Italy in particular has high emigration. 
They now concentrate on intercensally updating their 
communal registers for emigration. 

Each census form is compared with register cards for each 
commune. Their estimate of coverage is made only at the 
national level. The census was used to judge the quality 
of the communal registers. 
Respondent: Luigi Pinto, Director General, Central Statistical 


Netherlands (Response received after conference.) 
Latest census: 1971 

Municipal population registers are the basis of the popula- 
tion census. From the registers, data on population by age, 
sex, marital status, and nationality are compiled annually. 

Prior to 1971, the census was required to count all per- 
sons who were supposed to be on the register. The census 
was used to check the completeness and accuracy of the 
register by carrying through a case-by-case check of all 
persons listed in the registers and on the census. In 1971, 


this case-by-case check was not carried out, also no record 
was made of the illegal aliens in the Netherlands. As the 
case-by-case check was not carried out, a difference of 0.5 
percent between the total number of registered persons and 
persons listed in the census remained. 

About 2.3 percent of the persons listed on the municipal 
registers failed to respond to the census; they were either 
not at home or refused to cooperate. However, charac- 
teristics for these persons were available from their records 
in the municipal registers, and thus they were included in 
census publications. 

For the 1981 census, it is possible that legislation will be 
passed requiring the census to exclusively enumerate persons 
already registered in the municipal registers to prevent any 
implication of the census being used to track down illegal 
aliens in the Netherlands. 
Respondent: J. Schmitz, Head of the Department for Social 

Accounts, Central Bureau of Statistics 


Latest census: 1970 

In 1970, the census was used as a control for the popula- 
tion register; in that year, the central register was matched to 
the local registers. Since 1970, no quality study of the regis- 
tration system has been made, but it is felt that it provides 
good information down to the municipality level. 

The basis of the census in 1980 will be a central popu- 
lation register at the Central Bureau of Statistics. The Central 
Bureau has received monthly updates from local registers 
since 1970. The 1980 census will have an evaluation survey. 
It should give some information in completeness of the 

They, too, are trying to determine how to present mea- 
sures of incompleteness in the census. 
Respondent: Erik Aurbakken, Acting Director, Central 

Bureau of Statistics 


Latest census: 1978 

In 1978, for the first time, a postcensus investigation on a 
1 -percent sample was conducted to check correctness of the 
census by comparing census data with data collected by 
enumerators in dwellings in the sample. Results of this study 
will be used only to check accuracy of the census, not to 
adjust it. Poland has a population register system. 

In the census, the data were used to correct the current 
estimates of the population, which are kept current from 
vital statistics and migration data. Corrections of the current 
estimates based on the census at the national level varied 
from 0.3 to 0.4 percent, so no problems were created for 
data users. At the local level, greater differences between 
estimates and census counts were found, due to inaccuracies 
in registration of internal migration. 

Respondent: Prof. Stanislaw Kuzinski, President, Central 
Statistical Office 


Latest census: 1970 
Forthcoming census: 1981 

No coverage control for technical accuracy of 1970 census 
was undertaken, although coverage work was done in some 
areas of the country. 

In 1981, a coverage control with technical accuracy is 
planned. The methodology to be used is now being studied 
by the National Statistical Institute. They will send the 
Bureau a report on the methodology when the research is 
Respondent: J. F. Graca Costa, President, Council of 

Direction, National Statistical Institute 


Latest census: 1977 

Completeness of the census was considered satisfactory 
when resident population from the census was approximately 
equal to figures derived from current records. No adjustment 
was made, but detected omissions were corrected at all 
geographic levels. 

A number of matching studies undertaken during the 
census were: 

1. Several days before census took place, a preliminary 
visit was made to identify housing units and the 
number of persons per unit. 

2. Matching at local level with voting lists, agriculture 
register, and population register. 

3. At centralized level, matching of migration records 
within country to responses on census item. 

After the census, a "check survey" was carried out. A 
sample of households was selected by a two-stage sampling 
process. In the first stage, a sample of census sectors was 
selected and in the second, a sample of households. The 
final sample size was 11,750 households (0.18 percent of 
total) and 41,355 persons (0.9 percent of total). This survey 
resulted in estimates of 0.094 percent of households and 
0.155 percent of individuals in households were missed at 
the national level. (Omissions were largely due to inter- 
viewers misinterpreting instructions.) 
Respondent: Hie §alapa, Director General, Central Statistical 


Spain (Response received after conference.) 

Latest census: 1970 

Spain maintains a population register listing characteristics 
of persons by usual place of residence and previous place of 



After the 1970 census, an evaluation survey was carried 
out at the national level, stratified by metropolitan area, to 
study errors in coverage and content in the census. A sample 
of area segments was selected in which a reinterview 
(although the method of the census was partially self- 
enumerative) was conducted to check residence and 
occupancy in housing units on the day of the census. 

The results of the survey revealed a 2.3-percent overcount 
of the de jure population. As a consequence, the issue of 
coverage errors is not considered to be important. 

The unadjusted census population figures are used as 
official counts for administrative as well as other official 
purposes. However, adjusted figures are used for demo- 
graphic studies and population projections. 
Respondent: F. Azorin, President, National Statistical 



Latest census: 1975 

Censuses were conducted in 1970 and 1975. Two popu- 
lation registers are maintained (one of the total population 
and the second for use in sampling all persons born on the 
15th of each month). 

Completeness of coverage is not measured, but each 
census form is checked off against register entries. The 
population register is continually updated. In evaluation 
studies after 1970, a pilot study was made of completeness 
of coverage of housing units (conducted by mailmen). Also, 
an attempt was made to measure completeness of coverage 
of the economically active population. Census data were then 
compared with data collected in the two coverage studies. 
Estimates of coverage were made only to the national level 
and that of some metropolitan areas. 

Adjustments of economically active persons and housing 
units were made for regional planning but were not officially 
census data. 

Sweden is concerned with coverage, especially for the 
economically active population, but "coverage" refers to 
the determination of whether or not a person was economi- 
cally active. 
Respondent: Lennart Fastbom, Deputy Director General, 

National Central Bureau of Statistics 


Latest census: 1970 

No special study of completeness of coverage was made. 
Local administrators checked incoming questionnaires against 
population registers; rectification of enumerator errors was 
completed locally, and the forms were sent to the central 
census office. 

Based on the experience of larger cities, the number of 
individuals missed in 1970 was not more than 0.3 percent 
of the total population. Census results are not adjusted. 

Respondent: R. Rotach, Section of the Census, Federal 
Office of Statistics 


Latest census: 1979 

Coverage measures used: Postenumeration survey, issuance 
of census certificates, and completion of control forms. The 
postenumeration survey was carried out directly after the 
census in 25 percent of the dwellings in urban areas and in all 
dwellings of 25 percent of the rural sections by inspectors 
and enumerators. Inspectors checked registrations of all 
families and single dwellers; those missed by the census were 

Census certificates issued to all transients (long- and 
short-term) and to all persons even contemplating travel 
during the period of the census and the postenumeration 

Control forms were filled by enumerators during the 
census and filled by inspectors during the PES for individuals 
who thought they had been missed. After a comparison of 
the control forms to the census records, information for 
missing persons was transferred to census forms. 

The respondent indicated that these measures assured 
complete enumeration of the population. 
Respondent: M. A. Korolev, First Deputy Director, Central 

Statistical Board 

United Kingdom 

Latest census: 1971 

The issue of an undercount is not considered to be impor- 
tant; however, they feel that it is important to go out and 
prove that by some sort of coverage study. Census results are 
not adjusted. 

In 1971, reenumeration and independently compiled lists 
were used. Both were found to be unsatisfactory. (The first 
was not independent; the second was not controlled for 

In 1981, a small independent unit of people will be set up 
in a central office as well as in the field with the sole respon- 
sibility of coverage checks. Addresses will be checked against 
local taxation lists. Also a 0.5-percent sample will be taken 
for a labor force survey after the census, identified by enu- 
meration district, to match against census results. The sample 
will be larger than normal so the special coverage unit can 
conduct reenumeration interviews on a sample independent 
of the census. 

The size of the sample, 0.5 percent of the addresses, will 
allow estimates to be made down to the regional level. So far, 
only addresses are measured, not people. 
Respondent: S. C. Boxer, Head of Census Division, Office of 

Population Censuses and Surveys 

Yugoslavia (Response received after conference.) 


Latest census: 1971 

Following the census, a coverage control study was 
carried out to detect omissions or "double counting" of 
persons, households, and housing units. This coverage study 
was conducted in a stratified sample of enumeration districts 
covering each republic and autonomous province. Of the 
total 83,593 enumeration districts in Yugoslavia, 403 were 
in the sample. 

Results of the survey revealed an undercount of 0.8 per- 
cent of persons and about 0.9 percent of households. There 
were some differences in the proportions of undercount in 
the individual provinces and by rural or urban residence; 
however, these differences were small. Persons who were 
reported to be employed abroad had a very high undercount, 
around 7.7 percent. Census results were not adjusted based 
on the results of the coverage control study. The level of 
reliability of the census results as revealed by the results of 
the coverage study is considered to be important information 
for governmental agencies and for other data users. The 
results of the coverage study are used mainly for planning 
for the next census. 
Respondent: Ibrahim Latific, Director, Federal Statistical 





Direction des Statistiques et de la Comptabilite 
Nationale. La Situation De'mographique en Algerie 1967- 
1978, p. 6. Algiers, 1979. 

Official adjustment of population is based on a PES. 
Algeria includes adjustment for undercount in official 
postcensal estimates. 


Bangladesh. Census Commission. Bangladesh Popula- 
tion Census 1974. Bulletin 2. Census Publication No. 26. 
Dacca, 1975. Table 1. 

Bangladesh. Census Commission, and United Kingdom. 
Ministry of Overseas Development. Report on the 1974 
Bangladesh Retrospective Survey of Fertility and 
Mortality. London, 1977, p. 3. 

U.S. Department of Commerce, Bureau of the Census 
WP-79 is based on the figure in table 2. The officially 
cited underenumeration estimate of 6.88 percent is 
derived by using the unadjusted population as a base. 
The Bureau's practice is to base the percentage on the 
adjusted population, which in this case reduces the per- 
cent undercount to 6.44 percent. Bangladesh bases its 
official time series on this adjusted census figure. 


Unpublished estimates and projections prepared by 
Mario Gutierrez Sardan of INE (Instituto National de 
Estadi'stica, La Paz) with the collaboration of CELADE 
(1979, table 2). 

Official adjustment of population is based on a PES. 
Upon preliminary review, INE's analysis seems good; a 
7-percent figure is higher than most. The INE figures will 
be official once projections are published. 

The previous Census Bureau estimate of underenu- 
meration was 4.23 percent, based on an analysis of pre- 
liminary sample census figures. Pending further analysis 
of the final census and of the results of the PES, the 
Census Bureau is accepting the official estimate of under- 


Cameroon Direction de la Statistique et de la 
Comptabilite Nationale. Recensement General de la 
Population et de I' Habitat d'Avril 1976. Volume II, 
tome 1 (1979). Yaounde, p. 7. 

The official estimate of underenumeration is based on 
a postenumeration survey. 

Chile Oficina de Planificacion Nacional. Proyeccidn 
de la Poblacion de Chile por Sexo y Grupos Quinquenales 
de Edad. 1950-2000. Santiago, 1975. 

U.S. Department of Commerce, Bureau of the Census. 
Country Demographic Profiles— Chile by Sylvia Quick. 
Washington, D.C., 1978. 

The Chile Oficina de Planificacion Nacional (1975) 
estimated midyear population figures for every year, 1950 
to 1970, based on demographic analysis. The U.S. Bureau 
of the Census (1978, table 2) estimated the adjusted 
census population for 1970 shown above based on the 
official midyear 1970 population and the 1969 to 1970 
growth rate. 


DANE (low): Departamento Administrative Nacional 
de Estadi'stica (DANE). Boletin Mensual de Estadi'stica, 
no. 314 (Sept., 1977), p. 31. 

DANE (high): Departamento Administrative Nacional 
de Estadi'stica. Boletin Mensual de Estadi'stica, no. 308 
(March 1977), p. 9. 

Gonzalez. XIV Censo Nacional de Poblacion y III 
Vivienda: Cobertura Censal, 1976, table 7. 

U.S. Department of Commerce, Bureau of the Census. 
Country Demographic Profiles: Colombia. Washington, 
D.C., 1979, table 2. 

The range in the "enumerated" population is attrib- 
utable to different estimates of the population in the 
Armed Forces and the population in sparsely populated 
areas (National Territories). Range in adjusted census 
following the postenumeration survey of 1974 is due to a 
range in estimate for Bogota, the indigenous population. 


the Armed Forces, and group housing. See discussion of 
census and PES in: Ocha, L.H. and M. Pardo. "Estima- 
ciones de la Poblacion de Colombia en 1973: Una Recon- 
struccion Critical." Unpublished working paper. Pontifica 
Universidad Javeriana, 1979. 

It is not clear which estimate is official, as new 
estimates continue to be published. 

U.S. Census Bureau estimates are based on projections 
of a 1964 adjusted census population, and estimated 
fertility, mortality, and migration data. 

Results of postenumeration survey conducted by 
Oficina de los Censos Nacionales (OCN) reported by 
Cavallini, G. "Informe de la Mision de Aseson'a Realizado 
en la Republica del Ecuador desde el 6 al 30 de Marzo de 
1976." United Nations. Unpublished mission report, 
1976, p. 24. 

Estimates of OCN and Bureau of the Census are very 
close. An alternative evaluation by Carlos Cavallini via 
demographic analysis sets total underenumeration at 4 
percent, however, differences in ages over 5 between the 
PES-Chandrasekaran-Deming and demographic methods 
were insignificant. 

El Salvador 

Consejo Nacional de Planificacion y Coordinacion 
Economica (CON AP LAN). Indicadores Economicos y 
Sociales, January -June, 1976. San Salvador. 

and Direccion General de Estadi'stica y Censos 

(DIGESTIC). La Poblacion de El Salvador por Sexo y 
Edad en el Periodo 1952-2000, Principales Indicadores 
Demograficos. San Salvador, 1976. 

Estimate for the census date based on the official 
adjusted midyear population for 1970 (CONAPLAN and 
DIGESTIC, 1976, table 17) and an estimated midyear 
1970 to midyear 1971 growth rate. 

Hong Kong 

For PES: 

Hong Kong Census and Statistics Department. Country 
Report of Hong Kong. 1977 Mimeo. 

Hong Kong" Population and Housing Census 

1971: Main Report. Hong Kong, 1972, p. 8. 
For Official Estimates: 

Hong Kong Monthly Digest of Statistics. Hong 

Kong, 1979, table 15.1 


U.S. Department of Commerce, Bureau of the Census 
Country Demographic Profiles - India. Washington, D.C., 
1978, p. 4. 

India, Registrar General and Census Commissioner. 
Census of India 1971. General Population Tables. Series 
1 - India, Part ll-A (i). New Delhi, 1975. 


East -West Population Institute. Proceedings of 3rd 
Population Census Tabulation Workshop-Conference, 
Postcensal Considerations. Honolulu, 1974, p. 20. 

U.S. Department of Commerce, Bureau of the Census. 
Country Demographic Profiles - Indonesia. Washington, 
D.C., 1979. 


Statistical Centre of Iran. National Census of Popula- 
tion and Housing, November 1976, Based on 5-Percent 
Sample, Total Country. Table 1, 1978. 

Eorey, Joseph. U.N. Development Program Office, 
Tehran. "Progress Report on the 1976 Iranian Population 
and Housing Census." Abstract of report in East-West 
Center, East-West Population Institute. Asian and Pacific 
Newsletter, vol. 4, no. 4 (May 1978), p. 3. Honolulu. 

Official adjustment of population is based on a PES. 
Iran has not included adjustment for undercount in 
official postcensal estimates. PES result is "preliminary." 
IDDC is using the reported 3-percent undercount but 
suspects the undercount was higher. 


Central Bureau of Statistics. The Demographic Charac- 
teristics of the Population in Israel 1972-1976. Jerusalem, 

Adjustment was made based on 1961 census and 
births, deaths, and migration. Adjustment was made for 
Jews only and for ages to 9 years only. 


Official estimate of underenumeration is reported in 
UNPVSR, October 1979. 

Although not obtained through a postenumeration 
survey, the method used to arrive at 4.0 percent is un- 
known. Official census volumes and subsequent popula- 
tion estimates utilize the official adjusted census pop- 

International Demographic Data Center accepts the 
4.3-percent underenumeration recommended in an analy- 
sis by Wander (in Central Bureau of Statistics. Analysis 
of the Population Statistics of Jordan. Amman, 1966). 
Note that the 4.3-percent underenumeration estimated by 
Wander is published by the same organization as carried 
out in the census. This figure is not used in subsequent 
official estimates. 


Marks, Eli S., and Glenda Finch. "Developments in 
Techniques of Census Evaluation." Unpublished paper 
presented at the biennial meeting of the International 
Association of Survey Statisticians. New Delhi, 1978. 

Department of State Airgram, No. 2-34, March 23, 


U.S. Department of Commerce, Bureau of the Census. 
Country Demographic Profiles— Republic of Korea. 
Washington, D.C., 1978. 

The evaluation of the 1970 census done at the Census 
Bureau was based, in part, on the results of the 1970 
PES, which are not accepted as official estimates of 
underenumeration. Official population figures that are 
shown in publications are based on unadjusted census 


Ministry of Planning and Economic Affairs. 1974 
Population and Housing Census. Final Population Results 
for Liberia and Major Political Divisions. PC-1. Monrovia, 
1977, pp. 18 and 60. 

The official enumerated population is not available. 
The de jure census population was officially adjusted, 
based on a postenumeration survey. 


Malaysia, Department of Statistics. An Interim Report 
on the Postenumeration Survey. Kuala Lumpur, 1973. 

U.S. Department of Commerce, Bureau of the Census. 
Country Demographic Profiles— Malaysia. Washington, 
D.C., 1979. 

Official adjustment of population is based on a PES. 
Estimates of underenumeration derived at the U.S. Bureau 
of the Census imply that the PES estimate of under- 
enumeration was a I ittle too low. 


Pakistan Statistical Division. Census Evaluation Survey, 
Population Census, 1972. Karachi, 1974. 

Official estimates of underenumeration are based on 
the results of a PES. Pakistan does not use the inflated 
total in current estimates. 


PES estimate based on the combined procedures esti- 
mate table 10.3 in Marks, Eli S. "The Role of Dual System 
Estimation in Census Evaluation," Krotki, Karol (ed.). 
Developments in Dual System Estimation of Population 
Size and Growth. Edmonton, Alberta: University of 
Alberta Press, 1978, pp. 156-188. 

The PES results are preliminary, and there has been no 
communication from the Direccion General de Estadi'stica 
y Censos as to whether they have accepted these PES 
results. It should also be noted that: (a) There were 
actually two procedures used which yielded different 
results. The above results are a combination of the two. 
(b) The PES did not cover all of Paraguay; it excluded the 
Chaco and some other sparsely settled areas (Marks, 1978, 
p. 172) and the institutional population, (c) The coverage 
estimates are subject to sampling variability; the standard 

deviation of the estimated percent completeness is about 
0.7 percent. 

The U.S. Bureau of the Census estimated underenu- 
meration is higher due to the use of the PES adjustment 
for ages 5 years and over and an adjusted population un- 
der age 5. This population was derived from the adjusted 
female population and estimates of fertility and mortality 
for the 5 years prior to the census. 


Oficina Nacional de Estadistica y Censos (ONEC). 
"Perspectives de Crecimiento de la Poblacion del Peru 
1960-2000." Boletin de Analisis Demografico, no. 16 
(1975), p. 103. 

The estimates of underenumeration from the analysis 
of ONEC are based on the midyear 1970 population, 
reverse-survived from the enumerated June 4, 1972 
population, compared with the 1970 population corrected 
for both household underenumeration and the average 
number of persons per household. The corrections used 
were based on the results of the postenumeration survey, 
which indicated an underenumeration of 3.46 percent of 

The procedure used by the U.S. Bureau of the Census 
to estimate underenumeration for Peru is similar to the 
one described above. 

Official projections from CELADE and ONEC imply 
4.9 percent underenumeration. 

Sierra Leone 

The final census population and adjusted population 
are reported in PVSR October 1979. 

No postenumeration survey was carried out. 

South Africa 

Department of Statistics. South African Statistics 1978. 
Pretoria. P. 1.4. 

Estimate of underenumeration calculated at the U.S. 
Bureau of the Census from official South African time 
series. The method of estimation used in deriving official 
South African time series is unknown. No PES was 
known to have been taken. 

Sri Lanka 

Nadarajah, Thambiah. "Sri Lanka. The 1971 Census of 
Population and Housing." Introduction to Censuses of 
Asia and the Pacific 1970-74, edited by Lee-Jay Cho. 
Honolulu: East-West Population Institute, 1976, p. 179. 

Official estimate of underenumeration is based on pre- 
liminary analysis of the December 1971 postenumeration 

Census population adjusted at the U.S. Bureau of the 
Census is based on the preliminary census figure of 
12,712,277, adjusted for estimated underenumeration 
based on demographic analysis. 


Official Sri Lankan estimates are not based on adjusted 

Department of Statistics, Population Census Office. 
1977. Second Population Census 1973, vol. 1, tables 9 
and 12. 

Official adjustment of population is based on a PES. 
Sudan has not regularly included adjustment for under- 
count in official postcensal estimates (not, for example, 
in estimates reported in PVSR). IDDC is using the official 
estimate of the undercount but suspects undercount was 


For enumerated population: 

National Statistical Office. 1970 Population and 
Housing Census. Whole Kingdom. Bangkok, 1973. Table 
For estimated underenumeration and adjusted population: 
Arnold, Fred, and Mathana Phananirama. Revised 
Estimates of the 1970 Population of Thailand. Research 
Paper No. 1, National Statistical Office. Bangkok, 1975. 
Table 13. 
For Bureau of the Census estimates: 

U.S. Department of Commerce, Bureau of the Census. 
Country Demographic Profiles— Thailand. Washington, 
D.C., 1978. Tables 1 and 2. 

Yemen (Sana'a) 

Steffen, Hans. Yemen Arab Republic: Final Report. 
Airphoto Interpretation Project, Swiss Technical Coop- 
eration Service, carried out for the Central Planning 
Organization, Sana'a, 1978, pp. I/57-59. 

Official adjustment of population is based on a PES. 
The official adjusted population includes an estimated 
population of 48,602 living in areas not covered by 
the census. This adjustment is in addition to the PES 

No official postcensal estimates by the government of 
Yemen (Sana'a) are available. 





One of the significant emerging issues of the 1970's in 
connection with census undertaking is the completeness of 
coverage of the population. In the United States, as you may 
have read, our studies have shown that in the 1970 census we 
missed about 2 1/2 percent of the population with significant 
differentials for subgroups of the population and for age and 
sex. The issue of the undercount in the United States census 
has become very important because of the many programs 
which now rely on census data— from the distribution of 
billions of dollars of Federal funds to the impact on political 
representation. Since this issue is becoming so dominant 
relative to the 1980 census, we are interested in learning 
about the experiences of other countries in regard to census 
undercounts, specifically: 

1. Do you routinely measure the completeness of coverage 
of population in your censuses? 

2. What kind of methodology is used? 

3. How extensive is the geographic detail for which 
coverage estimates are prepared? 

4. Are the census data adjusted and, if so, what specific 
uses are made of the adjusted series? 

5. Are census data (unadjusted) used for some purposes 
whereas adjusted figures are used for others? 

6. Is the question of census undercount an important 
issue in your country? 

I would very much appreciate hearing from you on this 
subject. Any publications or other material, e.g., memoranda, 
that your office has issued relative to the question on census 
undercounts and adjustments would be most welcome. 


Acting Director 
Bureau of the Census 

Floor Discussion 

In response to a question on how many people do not 
register in the required registration for voting in Australia, 
Mr. Doyle said the electoral office takes over corrected 
district data and works out how many people should be 
registered. In areas with over 5-percent nonregistration, 
regular followup surveys are carried out. Less than 1 percent 
of the population fails to register. There are a small number 
of illegal immigrants, who arrive by boat, but internal travel 
is difficult and immigration laws are strict. 

Another member of the audience asked when the 
corrected estimates of Australian population appear after a 
census. Mr. Doyle said that estimates based on the 1976 
census will still be produced through March 1982. Corrected 
estimates are produced continually. The 1981 census will be 
incorporated in the estimates series starting about 1982. In 
March 1982, an adjusted total population figure for the 1976 
estimate will be produced using the 1981 census counts. This 
estimate for 1976 will be adjusted for undercount, but it will 
also be used to adjust the de facto figure to a de jure 
basis— taking out visitors to Australia, putting back in 
Australians on short-term travel overseas, and counting 
everyone at their usual place of residence rather than where 
they were enumerated. 

It was emphasized that politicization did not seem to be 
an issue in most of the countries surveyed, even though the 
numbers from the census are used for political purposes in 
some of the developing countries. Politicization is though to 
be growing in Canada, however. The census act there 
specifies that the population is what the Chief Statistician 
says it is for very large formula grant programs. As a result, 
the decision of the Chief Statistician is being questioned on a 
political level. The constitutional division of powers is similar 

in Canada to that in the United States, but the privileges of 
the provinces are far more jealously guarded. So it would be 
anathema to Canadians for the Federal Government to 
distribute money below the province level. To the extent 
that the issue of undercount is politicized, it is only to the 
province level, not below. The next step, though, is that the 
provinces have to distribute money within themselves. 
Statistics Canada may have to step in and help there. 

It was noted also that Canada seems to be the only 
country that consistently sticks to the reverse record check 
methodology. The demographic analysis technique does not 
work well in Canada— there is a great deal of measurable 
emigration. The Canadian PES grossly underestimated the 
undercount, whereas the reverse record check seemed more 
reliable. There is a great deal of discussion about how to 
break down the PES to the province level. 

It was felt that the Census Bureau is committed to 
demographic analysis at the national level and that the PES 
will be used below the national level. The PES tends to give 
geographic variation. Thus what the Bureau will use is a 
combination of the methods to come up with its estimates. 

One aspect ot the PES operation in Australia was noted 
that makes it more reliable than it would be in Canada or 
America. The Australian PES is completed 3-5 weeks after 
the census day, which improves the matching capability 
much more than a 6-month break between the census and 
the PES in the United States. A PES reduces matching 
problems but increases correlation problems, that is, one 
tends to miss the same people. However, there is a big 
difference in whether one has to adjust for a 5-percent or a 
40-percent difference between the PES and the demographic 



Legal and Constitutional Constraints on Census 
Undercount Adjustment 

Donald P. McCullum 

Alameda County Superior Court, Calif. 


This paper will present the developing law on the 
utilization and adjustment of the decennial census of 
population. The permissibility of adjustments to the census 
undercount for apportionment of Representatives in 
Congress, and allowed deviations for federally funded 
programs will be reviewed. 

Feasible legal considerations by the Bureau of the Census 
to adjust the census undercount for the 1980 decennial 
census and the mid-decade census of 1985 will be suggested. 

Political, policy, and administrative considerations are not 
within the scope of this commentary. 


Article I, Section 2, of the United States Constitution 
provided at its initial ratification (1788). 

". . .Representatives and direct Taxes shall be apportioned 
among the several States which may be included within 
this Union, according to their respective Numbers, which 
shall be determined by adding to the whole Number of 
free Persons, including those bound to Service for a Term 
of Years, and excluding Indians not taxed, three-fifths of 
all other Persons. The actual Enumeration shall be made 
within three years after the first Meeting of the Congress 
of the United States, and within every subsequent Term 
of ten Years in such Manner as they shall by Law direct. 
The Number of Representatives shall not exceed one for 
every thirty Thousand, but each State shall have at Least 
one Representative;. . ." 

Article I, Section 2 was amended by Section 2 of the 14th 
amendment to the Constitution, in 1868 in part as follows: 

"Representatives shall be apportioned among the several 
States according to their respective Numbers, counting the 
whole Number of Persons in each State, excluding Indians 
not Taxed." Borough of Bethel Park v. Starts 449 F.2d 
575,578(1971) USCA 3rd Cir (Penn). 

The Constitution embodied Edmund Randolph's proposal 
for a periodic census to ensure "fair representation of the 
people," an idea endorsed by (George) Mason as assuring that 
"numbers of inhabitants" should always be the measure of 

representation in the House of Representatives. Wesberry v. 
Sanders 376 U.S. 13-14; 84 S.Ct. 526,533 (1964). 

A census is an official enumeration of the inhabitants with 
details of sex, age, family, etc., and the public record thereof; 
it is not merely a sum total, but an official list containing the 
names of all the inhabitants (citations). A "census" is not an 
estimate of the population. Union Electric Co. v. Cuiure 
River Electric Coop, Inc. 571 S.W.2d 790,794 (Mo) (1978). 

From the beginning, Congress has consistently provided 
for the mandated enumeration. At the First Congress, 
Session II, the First Decennial Census Act was adopted 
March 1, 1790, and, in part, provided: 

Chap. II. An Act providing for the enumeration of the 
Inhabitants of the United States. (a) 

Section I, Be it enacted by the Senate and House of 
Representatives of the United States of America in 
Congress assembled, That the marshals of the several 
districts of the United States shall be, and they are hereby 
authorized and required to cause the number of the 
inhabitants within their respective districts to be taken; 
omitting in such enumeration Indians not taxed, and 
distinguishing free persons, including those bound to 
service for a term of years, from all others; distinguishing 
also the sexes and colours of free persons, and the free 
males of sixteen years and upwards from those under that 
age; (1 Stat. 101) 

The current congressional enactment, at title 13, United 
States Code, follows in part: 

Section 141. Population and other census information 

(a) The Secretary shall, in the year 1980 and every 10 
years thereafter, take a decennial census of population as 
of the first day of April of such year, which date shall be 
known as the "decennial census date," in such form and 
content as he may determine, including the use of 
sampling procedures and special surveys. In connection 
with any such census, the Secretary is authorized to 
obtain such other census information as necessary, 
(d) Without regard to subsections (a), (b), and (c) of this 
section, the Secretary, in the year 1985 and every 10 
years thereafter, shall conduct a mid-decade census of 
population in such form and content as he may deter- 
mine, including the use of sampling procedures and special 



surveys, taking into account the extent to which infor- 
mation to be obtained from such census will serve in lieu 
of information collected annually or less frequently in 
surveys or other statistical studies. The census shall be 
taken as of the first day of April of each such year, which 
date shall be known as the "mid-decade census date" . . . 
(Oct. 17, 1976 P.L 94-521, Section 7(a), 90 Stat. 2461) 

The object of the census, as stated in the case of 
Loughborough v. Blake Wheat, 31 7,320, 5L. Ed 98 (1820), is 
"to furnish a standard by which representatives, and direct 
taxes, may be apportioned among the several States which 
may be included within this union." 


At the outset, it should be understood that the significant 
litigation has been directed toward malapportionment within 
the several States, not among them. 

Stated another way in Meeks v. Avery, 251 F.Supp 
245,249-250 (D.C. Kansas) (1966): 

Reference in Article I, Sections 2 and 4; in Section 2 of 
the Fourteenth Amendment to the Constitution; and in 2 
USCA Section 2a, to the enumeration of the population 
of the various States have to do with the apportionment 
of representatives among the States, not within them. 

Serious question has not been raised as to apportionment 
of representatives "among the several States according to 
their respective numbers." There has likewise been no 
credible suggestion that anything but an actual decennial 
enumeration should be the basis for apportionment of the 
representatives "among the several States." 

The Constitution, as amended, mandated it and the 
Congress has legislated it. 

The rush of cases in the 1960's sought to correct 

malapportionment of State legislative districts (Baker v. Can 

369 U.S. 186, 82 S.Ct. 691, 7LEd2d 663 (1962); Gray v. 

Sanders 372 U.S. 368, 83 S.Ct. 801, 9L.Ed 821 (1963); 

Reynolds v. Sims 377 U.S. 533, 84 S.Ct. 1362 1 2L.Ed2d 506 

(1964)) and malapportionment within State boundaries of 

congressional districts (Wesberry v. Sanders 376 U.S. 1, 84 

S.Ct. 526, 11L.Ed2d 481 (1964); Kirkpatrick v. Preisler 394 

U.S. 526, 89 S.Ct. 1225 22L.Ed2d 519 (1969): Wells v. 

Rockefeller 394 U.S. 542 80 S. Ct. 1230 22L.Ed 535 


As soon as the apportionment issue shifts to "within State 

boundaries," we experience a full range of attempts to use 

something other than the published enumeration of the 

decennial census for reapportionment of congressional 

districts (within States). 

In Wesberry v. Sanders, supra, at page 18, the court said. 

While it may not be possible to draw congressional 

districts with mathematical precision, that is no excuse for 
ignoring our Constitution's plain objective of making 
equal representation for equal numbers of people the 
fundamental goal for the House of Representatives. That 
is the high standard of justice and common sense which 
the founders set for us. 

In Kirkpatrick v. Preisler, supra, at page 535, on a 
Missouri reapportionment statute creating congressional 
districts, the U.S. Supreme Court suggested that there may 
be instances where something other than an actual enumera- 
tion would suffice: 

... We recognize that a congressional districting plan will 
usually be in effect for at least 10 years and five 
congressional elections. Situations may arise where sub- 
stantial population shifts over such a period can be 
anticipated. Where these shifts can be predicted with a 
high degree of accuracy. States that are redistricting may 
properly consider them. By this we mean to open no 
avenue for subterfuge. Findings as to population trends 
must be thoroughly documented and applied throughout 
the State in a systematic, not an ad hoc, manner. . . . 

Large and obvious changes in population may be a basis 
for using information other than the enumerated decennial 
census. In Shalvoy v. Curran, 393 F.2d 5557 USCA 2 Cir 
(USDC Conn.) (1968), the court stated: 

. . . While census figures are a proper basis for population 
determination of a particular place in reapportionment 
cases over the ten years following its taking, under 
circumstances such as those here presented where such 
large and obvious changes in population have occurred as 
to be subject to judicial notice, the census report of seven 
years before is not immune from challenge. 

In Meeks v. Avery, 251 F Supp 245, 250 (USDC Kans.) 
(1966), the court found that use by the Kansas legislature of 
the 1964 State enumeration figures instead of the 1960 
Federal census as the basis for determination of population 
was nothing more than the exercise of judgment in the 
legislative process, and found no constitutional fault with the 
choice made. 

It should be noted that the Kansas count was an "actual 
head count of the inhabitants of the State," its accuracy was 
not questioned, and it was closer in point of time. 

Without citation of authority, Meeks v. Avery, supra, at 
page 250, suggests that the constitutional mandate and 
legislative enactments requiring enumeration for apportion- 
ment of Representatives relates only to among the States, 
and not to within them. 

In Exon v. Tiemann, 279 F Supp 603, 608 (USDC D. 
Neb. 1967), the court, in adjudicating a 1961 congressional 
redistrict plan void, suggested that estimates of the 1966 


population provided by the Bureau of Business Research of 
the University of Nebraska, rather than the 1960 census 
figures, were more valid: 


We do not intend to say in this opinion that the Bureau of 
Business Research estimates are the best standard to use if 
redistricting is attempted. We are saying that better 
evidence of population in 1967 is available than the blind 
use of the 1960 census. 

The case of Dixon v. Hassler, 412 Supp 1036 (USDC W.D. 
Tenn.) 1976, on review of a 1970 congressional reappor- 
tionment statute, is the most recent court statement 1 made 
setting forth guidelines for population estimates in lieu of the 
decennial census: 

We have reached the conclusion that, in making this 
reapportionment ruling, this court is not confined as a 
matter of law to the 1970 Federal census figures and that 
we may consider the estimates tendered by the Shelby 
County Republican Party. Kirkpatrick v. Preisler, 394 
U.S. 526 at 535, 89 S.Ct. 1225 at 1231, 22 L.Ed.2d 519 
at 527 (1969) and the concurring opinion at 537, 89 S.Ct. 
at 1232, 22 L.Ed. 2d at 528, so indicate. 2 
Although we have determined that this court may 
consider the population estimates, before considering the 
validity of such estimates as compared with the 1970 
Federal census figures, we must first determine, as a legal 
proposition, the strength that the evidence supporting the 
estimates must have in order to overcome the presumptive 
correctness of the "head count" upon which the 1970 
Federal census was based. Exon v. Tiemann, 279 F.Supp. 
603 (D.Neb.1967), while holding that estimates may be 
considered, does not really deal with this question. 
Kirkpatrick, supra, at 535, 89 S.Ct. at 1231, 22 L.Ed.2d 
at 528, indicates that the estimates must have a "high 
degree of accuracy" if they are to overcome the pre- 
sumptive correctness of the prior decennial census. While 
it is difficult to express such propositions in quantitative- 
qualitative terms, we believe that the standard to be 
applied is that the decennial census figures will be 
controlling unless there is "clear, cogent, and convincing 
evidence" that they are no longer valid and that other 
figures are valid. This is a test that lawyers are familiar 
with and are accustomed to dealing with. 

At page 1041, the court made a finding that the Shelby 
County Republican Party "has not presented clear, cogent, 
and convincing proof" that the 1970 Federal census figures 

1 Affirmed on Appeal under the name of Republican Party of 
Shelby County v. Dixon et at. No. 76-65 429 U.S. 934; 50 L.Ed.2 
303 ;97 S.Ct.346 (November 8, 1976). 

2 By so ruling, we do not mean to say that, if these districts had 
been constitutionally apportioned following the 1970 census, they 
could be reapportioned based on population changes prior to the 
1980 Federal census. 

are not the best evidence of the current population of these 
districts and that the provisional estimates of the Bureau of 
the Census are the best evidence of such population. 

A case relating to other Federal programs and benefits and 
the undercount is City of Camden v. Plotkin (USDC D.N.J.) 
(Oct 31, 1978) 466 F.Supp. 44, 51. There the court held 
that the individual plaintiffs had standing to sue on alleged 
undercount in the Camden pretest, which would adversely 
affect the city's recognition as a prime sponsor under CETA. 

The court said: 

It is clear, then, that the plaintiffs interest in a federally- 
funded job program is within the "zone of interest" 
contemplated by Congress when it mandated census 
counts between each decennial census. 

The population estimates at issue here were formulated by 
defendants pursuant to 13 USC Section 181 (Interim 
Current Data). The pretest was undertaken under the 
authority of 13 U.S.C. Section 193 which provides: 
"Preliminary and supplemental statistics . . . ." 

In the 1985 mid-decade census legislation, Congress took 
particular pains to make certain that the mid-decade census 
would not be used for apportionment of Representatives in 
Congress "among" the several States or "within" the several 

13 U.S. Code Section 141(e) (2): 

Information obtained in any mid-decade census shall 
not be used for apportionment of Representatives in 
Congress among the several States, nor shall such 
information be used in prescribing congressional 

13 U.S. Code Section 195, Use of Sampling: 

Except for the determination of population for pur- 
poses of apportionment of Representatives in Congress 
among the several States, the Secretary shall, if he 
considers it feasible, authorize the use of the statistical 
method known as "sampling" in carrying out the 
provisions of this title. (October 17, 1976, 90 Stat. 

Only population and population characteristics data ob- 
tained in the most recent decennial census could be used by 
local government where the law conferring the benefits on 
local governments required the decennial census (section 


The constraints that may be imposed upon utilization of 
"sampling," estimates, statistical methods, projections, or 
trends in adjusting the census undercount are limited by 
legal, political, and administrative considerations. 


Except for the unambiguous mandate of the constitu- 
tional amendment XIV, section 2, 

Representatives shall be apportioned among the several 
States according to their respective numbers, counting the 
whole number of persons in each State, excluding Indians 
not taxed . . ., 

it appears that any "sampling" or statistical method other 
than direct enumeration could be used for congressional 
apportionment within States, including congressional redis- 
ricting and reapportionment of local legislative offices. 

The methods used must comport with the constitutional 
requirements which provides for equal representation for 
equal numbers of people, permit only limited population 
variances which are unavoidable despite a good faith effort to 
achieve absolute equality, or for which justification is shown. 
Kirkpatrick v. Preisler, supra, page 525. 

On the reapportionment of Representatives of Congress, 
within the States the decennial census figures must be used 
unless there is "clear, cogent, and convincing evidence" that 
they are no longer valid and that other figures are valid. 
Dixon v. Hassler, supra, 1 040. 

It should be remembered that there is no articulated 
proscription from using "sampling" in carrying out the 
provisions of this title. 13 U.S. Code, section 195. 

If "sampling" includes statistical methods, which embrace 

estimates, projections, trends, and adjustments for under- 
count, and this "sampling" passes the "clear, cogent, and 
convincing evidence" test, there is no legal reason to suggest 
that an adjustment for undercount could be made contempo- 
raneously with the required submission of the decennial 
enumeration data to the President. 

This, then, would allow reapportionment "within" the 
several States to proceed based upon the most accurate data; 
the same data could be used in "other" programs or 
"benefits" for local governmental units. 13 U.S. Code, 
section 183(b). 

All federally funded programs and benefits based upon 
population characteristics can be subject to the "sampling" 
provision of 13 U.S. Code, section 195. 

The Census Bureau's application of criteria to enumerate 
will not be disturbed unless it is proven that the Bureau 
failed to apply the proper criteria in a reasonable manner or 
its application lacked a rational basis. Borough of Bethel Park 
v. Stans 449 F 2d 575, 579 (1971) (USCA 3rd Cir Penn.). 


The U.S. Constitution, case law, and the enactments of 
the Congress, under the heretofore specified circumstances, 
would permit and may require, if feasible, adjustment of the 
census undercount for all purposes except apportionment of 
Representatives of Congress "among" the several States. 

Floor Discussion 

The group agreed that a strong case had been made for 
adjusting for purposes of apportionment. However, the 
census counts have to be given to the Congress in January 
1981, and data with which to adjust are unlikely by that 
time. Adjustment under such time constraints— except for 
body counts, missing questionnaires, and the like— is 
difficult. There is a need to adjust for equity, business uses, 
and the social sciences, although the case for business may 
not be too strong. Having two sets of figures should be no 
problem; the Census Bureau constantly adjusts retail sales 
data, for example, and the consumer price index is adjusted. 
The marketing field needs accurate figures for small areas. 
Although a 2-percent error here is acceptable, 10 to 15 
percent would be too large. A 2- to 3-percent undercount has 
been mentioned frequently. There was some concern should 
it reach 5 to 10 percent, however. This is not inconceivable 
should something like the proposed selective service registra- 
tion suddenly become linked with the census in people's 
thinking. The Bureau must decide within 3 to 4 months what 
type of adjustment it will make, possibly with a minimum 
cutoff for areas with insignificant adjustments. Further, the 
adjustment procedure must be statistically and legally 


There is no need for a final decision to come out of this 

conference nor is there need for a vote. Rather, the Bureau 

should have a sense of the meeting, which is that no matter 

what procedure is selected, research should not be reduced. 

Secondly, if an adjustment by age, sex, and race is chosen, 

this might affect per capita income data, but not necessarily 

reduce the undercounts for the age groups that are 

responsible for the income. In fact, the reverse might be true. 

Whether income would be adjusted or fixed would depend 

on the way the adjustment was carried out. It could affect all 

of the other, data (employment, etc.), or the characteristics 

could be based on the unadjusted figures. 

It was noted that since political representation is at the 

heart of the issue, adjustments for minorities are an 

important concern. Nine of the 10 largest congressional 

districts that are losing population are predominantly black 

and have black Representatives. These districts are also the 

ones where the undercount is the highest and may be those 

hit hardest by reapportionment, while the areas gaining 

population have fewer blacks. The apportionment figures 

supplied to the Congress in January 1981 will not reflect the 

undercount, yet the need for adjustment is clear. With the 

history of black and other minority disenf ranch isement, 

using unadjusted counts will constitute a "new disenfran- 

chisement." This conference should suggest solutions to the 

technical details and move toward adjustment for apportion- 
ment purposes. Adjustment down to the block level was felt 
to be desirable, but the procedures for doing this will require 
further research. 

A question was raised on adjustments for Asians and 
Pacific Islanders. The Bureau indicated that counts will be 
reported for blacks and races other than white. There will be 
an undercount rate as well as a figure for blacks, so the 
residual (however unreliable) could be used as an adjustment 
for other races. The adjustment for the residual would 
increase in areas with fewer blacks, such as Hawaii. It was 
questioned that since no research on good ways to apply the 
synthetic method to Hispanics and illegal aliens has been 
completed, what should be done in the absence of a clear 
method? Of course, the Census Bureau will be the "expert" 
called in court cases. It was felt that if the Bureau knows 
what ought to be done, it should indicate what procedures 
will be followed and not leave the issue to the courts and the 

legislative branch to decide. 

There appeared to be little consensus about what should 

be done or who should do it, however. The timing of any 

adjustments is even more problematic. The distribution of 

funds is an ongoing process, so numbers can be adjusted for 

that use; but the distribution of seats is discrete and cannot 

be changed the following year. When small-area data become 

available (by April 1, 1981), nine States will have already 

passed their redistricting deadlines for apportionment. 

Thirty-four States must finish by the end of 1981, and only 

six can wait until 1982. This is a tight timetable, especially 

for the 8 to 10 States that the Census Bureau estimates will 

either gain or lose seats as a result of the 1980 census. 

The idea of adjusting for reapportionment is an example 

of the thesis that "adjustment begets more adjustment" and 

suggests a further adjustment problem. Congressional 

redistricting pays no attention to municipal boundaries. This 

implies that it will be necessary to adjust the counts for 4 

million blocks. In fact, if figures other than the initial counts 

are used, reapportioning State legislatures could occur every 

2 years on the basis of perfectly good revenue-sharing 


The failure to use adjusted figures for apportionment does 

not violate the doctrine of equal protection because Article I 

and Amendment XIV of the Constitution both specify the 

counting of all inhabitants, and apportionment for the 435 

House seats is determined by that enumeration. 

It was observed that a 10-percent margin in a redistricting 

plan does not have to be justified, but the margin can go 

higher if justified (an example of this is the 16.4 percent in a 



Virginia case). It was reemphasized that using adjusted 
figures for apportionment among States has never been raised 
in the courts. The litigation has been addressed to within- 
State redistricting. The use of a 10-percent margin might 
dilute equal rights and violate the Voting Rights Act, 
however. Any margin, however small, would be too much if 
it deprives people of their rights. The courts will ask, in such 
cases, how "finely tuned" the census figures are. The Bureau 
imputes figures for "closeout" cases. If the Constitution 
limits apportionment among the States to the actual count, 
but adjusted data can be used within the States for 
redistricting, then the State figures will sum to a higher 
national total. However, there still are only 435 seats to be 

Should the adjustment decision be made by the courts 
instead of the Bureau, it was thought that the effect on 
professional credibility and morale in the Census Bureau 
might be negative. It might also negatively impact the court's 
reputation. If the courts decide on the adjustment rather 
than the Census Bureau, it will be harder to revise the 
methodology in the future. The Bureau did not practice 
imputation, as such, prior to the 1960 census; the court 
decided in East Chicago vs. Stans that the Bureau had acted 
within the meaning of the Constitution in carrying out the 
1970 census (in which there was imputation). This would 
appear to be at the edge of conformity to the law, however, 
and imputation beyond the bounds of 1970 would be 

of Statistical 

Should the Census Count Be Adjusted for 
Allocation Purposes - Equity Considerations 

I. P. Fellegi 

Statistics Canada 



This paper examines a very special kind of census data 
use: Its legislated utilization as input to formulas on the 
basis of which funds are allocated from one level of govern- 
ment to another. To the extent that the census counts are 
subject to underenumeration, their use for this purpose 
represents a deviation from the legislated intent that 
(implicitly) assumes the counts to be free of error. 

Estimates of the undercount are often available as part 
of the evaluation of the census. These estimates are them- 
selves subject to sampling and other errors. Should they 
nevertheless be used to adjust the census counts for the 
purpose of legislated intergovernmental allocation of funds? 
The present paper concentrates on the narrow question of 
adjustment for this single purpose. However, it is important 
to keep in mind that in the real world of statistical policy 
a number of other questions must also be answered before 
deciding on a specific course of action: If the counts are 
adjusted for one purpose (but not others), can users cope 
with more than one "official" set of population figures? 
Should intercensal population estimates also be adjusted? 
Should current surveys use adjusted intercensal estimates 
in their ratio estimation (some of the current surveys have 
themselves formula allocation uses)? What would be the 
impact on electoral redistricting (another legislated use of 
census data which, however, requires considerably more 
geographic detail— at least in Canada— than the counts needed 
for intergovernmental fund allocations)? None of these related 
questions are examined in this paper. Even within the narrow 
context of a single application, i.e., legislated allocation of 
funds, there are several issues which have to be considered: 
The intent of the legislation; the danger of "politicizing" 
statistics or, more precisely, whether the danger of political 
pressures on the census increase or decrease when the counts 
are adjusted for underenumeration; and the long-term feed- 
back effect identified by Nisselson [3] , i.e., whether adjust- 
ing the count diminishes the incentive, particularly for 
minority groups, to work with the statistical office to 
improve the census the next time around. Again, in this 
paper, most of these considerations will be largely set aside, 
concentrating on the notion of legislative intent or "equity." 

Finally, allocation formulas seldom use only the census 
as their data source. However, for the sake of simplicity, we 
will examine only the impact of errors in the census counts 
on fund allocations. 

Formula allocations of funds from one level of govern- 
ment to another take a variety of forms. Many such payments 
can be broadly characterized as follows: 

(a) The national government (Federal, for the sake of 
specificity) provides funds directly to the next level 
of government (provinces, to be specific); 

(b) The legislation implies explicitly or implicitly the cal- 
culation of a per capita amount in province /', say 
X-. Then the total payment intended for province / 
is calculated as 



where P- is the total population of province /, or that of a 
subgroup of the province, such as the number of univer- 
sity students, or the number of persons below the poverty 
line. The quantity X, may depend on P-. 

In the presence of some underenumeration, the quantity 
Pj is estimated as p-. Applying the legislated allocation for- 
mula, but using the known quantities p- instead of the 
unknown P-, the quantity T- would be estimated as t-. There- 
fore, the realized per capita payment in province / is no 
longer X- but rather 

x r t l lP l 


Note that in (2) above, the denominator is Pj, not p-, since 
the actual population in province / is Pj, not Pj, so the de 
facto per capita payment in a province is equal to the actual 
payment (as computed using the census estimates), divided 
by the actual total (or subgroup) population of the province. 

The per capita deviation between the amount intended 
by legislation and the amount actually received is 

dj = Xj-Xj 


We shall define the notion of equity as a numerical measure 
of the extent to which legislative intent is complied with. 
Thus, this paper proposes as the measure of inequity due to 
the census undercount the square of the deviations d.- aver- 
aged over the total population (or subgroup) 



/ = 2 P. df I P 


P.- p. 

U; = — 

' P: 



/> = 2 />■ 

we get 

fi=c 2 ^P-,uf IP 


Another way of interpreting (4) would be to think of d- 
as the per capita underpayment or overpayment received by 
a province; in which case / is the weighted average over all 
provinces of the square of the per capita underpayments or 
overpayments, the weights being the provincial populations 
(or applicable subpopulation counts). Indeed 

d f = X r x r (T r t i )/P i 


Note that / does not measure only the extent to which 
provinces are "shortchanged"; overpayments or under- 
payments are given equal weight in that they both have the 
same effect on the legislated intent of equity. / will be used 
as a measure to determine whether the adjusted or unadjusted 
census counts lead to less inequality. It is similar to the 
measure proposed by Jabine [2] . 


For the sake of simplicity or presentation, we must 
simplify the large variety of allocation formulas actually 
used. We will focus on two prototype, or model, formulas. 

Fixed Per Capita Payment 

According to this formula, the amount paid to a province 
is directly proportional to the number of persons in the 
target group: if c is the intended per capita payment, then 

Tj = P jC 

A reasonable approximation of this allocation would be that 
legislated by the U.S. Elementary and Secondary Education 
Act, as described in the "Report on Statistics for Allocation 
of Funds," published by the Office of Federal Statistical 
Policy and Standards [4] . 

Note that since equity was defined as compliance with 
legislative intent, /j^O even if u.- is a constant over all pro- 
vinces. In this case every province is shortchanged by the 
same per capita amount. So while there is no differential 
shortchanging of provinces, the legislative intent is violated 
to the extent that the de facto per capita payment differs 
from the legislated one. 

Fixed Total Payments 

Under this model, the Federal Government distributes to 
provinces a fixed amount, the total received by a given 
province being proportional to its population. Then, if C is 
the total to be distributed, 

T; = 


X: = 

X; = 




tj = Pj c 

P f -L Pj 

Xj= c 

x r P i c/P j 

p.- Pi 

d ; = -L 1 c 

1 P. 

/l = (C 2 IP) 2 P: ( 

P.- O. 

d i 

= ( 

2P- P- 2p. 

) C 

It is easy to see, using the notation of (6), that the corre- 
sponding inequity measure, / 2 , becomes 


h = 


P* (1-t7) 2 / 



Denoting by u- the proportionate underenumeration in 
province / (which, of course, may theoretically be negative 
in the case of overenumeration) 

u = 2P i u i IP 


A reasonable approximation of this model occurs in the 


Canadian Fiscal Arrangements Act. According to this Act, 
the revenue capacity of each province is quantified in a 
fashion independent of the census. Let this measure be M- 
for the /th province. Then province / receives the amount 

-£ 2 M, - M. 

when the amount above is positive, and receives nothing if 
the amount is negative. Since M- is independent of the popu- 
lation count P-, the effect of underenumeration on receiving 
provinces can be studied within the fixed total payment 
model— in either case, the critical factor is the proportion of 
the total population residing in a province, as opposed to 
the absolute number in the fixed per capita payment case. 


A/, —(/, - /i ) =XP i u i 2 -XP i (u i , ,, 



The expression (13) above may not be estimable directly if 
no unbiased estimates of Uj exist. However, after some 
manipulation, we obtain using the notation of (1 1) 

A/, = 2/>,<? 2 



All terms of (14) are estimable except the last one. The last 
term is equal to zero if the estimates of underenumeration 
are unbiased, i.e., if ^-=0 for all /'. If this is not the case, a 
simplifying assumption is needed. 

Assumption A: If the estimates u- of underenumeration 
have a non-negligible bias, assume that 


We assume that estimates of the undercount proportions 
Uj are available which can potentially be used to adjust the 
census counts. Let 


be the available estimate of u ; . Let 

E &,) = e, 

so that 

b- =e- 
/ / 




is the bias of the estimates. We also assume that the popula- 
tion counts Pj have a negligible variance. 

If we were to adjust the population estimates, the adjusted 
counts would have a residual underenumeration (which 
could be negative) equal to u- - u. So the measure of in- 
equity, if the adjusted census counts are used for the fund 
allocation, can be obtained from (7) by substituting u-- u- 
for u : : 


= c 2 2 Pjbjj-Uj) 2 IP 


Ideally, one would like to find an adjustment that minimizes 
l\. This is unlikely to be possible. A more modest but still 
very relevant objective is to find an adjustment that reduces 
the inequity, i.e., for which the difference/j - /j^is positive: 

5/ i='i-'T=f & p i u i 


V, ) 2 1 

Leaving out the positive factor in front of the brackets, we 
will examine 

Assumption A is likely to be satisfied since the estimates 
U: are typically positive and are usually underestimates of 
the unknown underenumeration Uj (i.e., ^=0). Even if the 
quantities b- are not negative for all /, assumption A is satis- 
fied if the estimated underenumeration tends to be low 
U>50) for those subgroups which the census finds difficult 
to enumerate (even if bf>0 for groups which the census 
finds easier to enumerate). In other words, assumption A 
is likely to be satisfied if high values of u- are accompanied 
by negative values of b-, even if low values of u- are accom- 
panied by positive b-. Of course, assumption A can always 
be satisfied if a sufficiently conservative set of estimates 
u- is used. Most methods used in practice to provide estimates 
of Uj are likely to satisfy the assumption: Postenumeration 
surveys, dual method estimates, and reverse record checks 
(see later section for a brief discussion of the latter). The as- 
sumption may not hold for so-called analytic estimates of 


A/; = ZPje 2 -ZPj(Uj-ej) : 

Under assumption A 

a/i > a/; 

We will construct a test of the positivity of A/J , which will 
therefore serve as a conservative test of the positivity of 
A/ t . 

In order to estimate the first two terms of A/{, we note 
that if var u- is an unbiased estimate of Var Uj, i.e., if 

E (var U-) = Var a- 




E (uf - 2 var uj) = ef - 2 Var u f 

so that 


A/i = 2 />,- (u: 

2 var A ; 



would be an estimator of A/' if the quantities P f - were 
known. 1 However, since their role is only to provide weights 
to the averaging process, (17) is not likely to be sensitive to 
even reasonably significant variation in the weights P-. There- 
fore in place of P- we can use census population counts p- ( , or 
alternatively the adjusted census estimates Pf = P/ (l+u^.We 
get our first test: 

Test 1 : Adjust the census counts if 

Xp;®] -2vart/ / )>0 


A more conservative test can be constructed (and one that 
does not require the values PA as follows: 

Test 2 : Adjust the census counts if for all i 

u)-2varu j >Q 


Inequality (19) is, of course, equivalent to the condition 
that the estimated rel-variance of all provincial underenu- 
meration rates should be less than or equal to 1/2. It can 
also be looked upon as a form of significance test for the 
hypothesis of E (Uj)=0. 

Test 2 is suggestive of an alternative approach whereby 
those provincial populations for which (19) is positive would 
be adjusted, leaving the remaining ones unadjusted. If s t is 
the set of provinces for which (19) is positive and s 2 is the 
set of all other provinces, this alternative adjustment strategy 
results in the following value of the change in inequity: 

2 p. (u) - 2 var 0,) 

which is, of course, positive. 

The application of test 1 (or the stronger test 2) does not 
guarantee that the adjusted census counts will, in fact, reduce 
the inequity of allocation, since the inequity measures them- 
selves are based on sample estimates. All one can really say 
is that, if the sample size is large enough so that the sampling 
distribution of the left hand side of (18) is reasonably sym- 
metric, it is more likely that the adjusted counts will result 
in less inequity rather than the unadjusted ones. Put differ- 
ently, if there are no penalties attached to adjusting the 
counts, one could certainly use test 1 or test 2 as a suffici- 
ent condition of adjusting. A much more conservative test 
results if the basic strategy is to adjust only when the 

1 It should be noted that A// is itself based on sample estimates 
so the expected value of A/,' is not A/i. However, E (A/,'— A/J) 
= 0. 

evidence, in some sense, is overwhelming that the adjusted 
estimates would reduce inequity. Under this strategy one 
would adjust the estimates only if one were reasonably 
certain that the unadjusted estimates lead to a higher measure 
of inequity. 

Normally, if one wanted to construct a test for the 
positivity of A/'i , one would construct a test statistic based 
on its estimate A/{ and the standard deviation of the latter. 
However, since A/{ itself is a mixed expression involving 
both parameters (eA and sample estimates (0j), that approach 
would lead to a test of the positivity of the common ex- 
pected value of A/i and A?i , not of A/{ itself. We will there- 
fore consider the standard deviation of A/' - AT{ . Let 

d = estimated standard deviation of A/'i - A/i 

Then we have, under the usual assumption of approximate 

E (A// - A/i ) = 
Prob (A/{ - A/i > - 2d) = 0.975 
Thus /7we also have 

A? i > 2 d 
then from (21 ) and (22) it follows that 
Prob (A// >0) = 0.975 




Therefore (22) provides the desirable test of (23). We need 
to estimate, however, the standard deviation of A/i - A?/. 

A/J - A// = 2ZP f (epj - uf + var Q f ) 


Now, it is easy to show that up to terms of order Mn- 

(where n- is the sample size in province /') 

Var (e- u- - uj + var u-) = ef Var u- 


which, to the same order of approximation, is estimated by 

^2 A 

Uj var Uj 

Also, to the same order of approximation, it can be shown 

Cov (epj -uj + var Uj, epj - 2? + var u) 

=e.e. Cov (Uj, uj) 


so that the sign of the covariance terms in the variance of 
(24) is equal to those of Cov [&•, uj). We now make the 
following assumption. 


Assumption B: Assume that the estimates u- are either 
independent or not positively correlated with one another. 

Under assumption B an overestimate of the variance of 
A/i - A?I is obtained by 

4 2 Pj uj var u/ 

So, we obtain the following test: 
Test 3 : Adjust the census counts if 

Zp/ (u) - 2 var Uj) - 4 yj 2p? uj var u. > (27) 

If test 3 is satisfied, then under assumptions A and B the 
odds are at least 0.975 that the adjusted counts result in less 
inequity than the unadjusted ones. 

It should be noted that assumption B is satisfied (in fact, 
the terms of (24) are independent) in the case of postenumer- 
ation surveys whose stratum boundaries respect provincial 
boundaries. It is likely to be satisfied in the sense of non- 
positive correlations between 0/ if the provincial estimates 
U: are based on domain estimators (since the sum of the 
variances of estimates prepared for different domains is 
usually larger than the variance of the estimate prepared 
for the union of the domains). At any rate, assumption B 
is not nearly as important as assumption A, since the 
covariance terms are likely to be very small so long as the 
sampling ratios used in the survey to estimate u- are small. 
In fact, they are of the order MP, so that neglecting them 
is equivalent to neglecting the finite population correction. 

Nevertheless, if assumption B cannot be accepted, an 
even more conservative test results if the principle applied 
to the right hand side of (18) is applied separately to every 
term there. We then obtain 

Test 4: Adjust the census counts if for all i 

(with obvious modifications) if the fixed per capita payment 
c is constant within each province as opposed to its being 
constant for the whole country. In other words, c can be 
replaced by a set of constants Cj, as long as each of them are 
determined without reference to the census population 
estimates p-. This is, in fact, the case with the U.S. Ele- 
mentary and Secondary Education Act mentioned above. 
Even more generally, much the same results and derivations 
apply (with P- or p- replaced, respectively, by cP- and 
C:Pj) under the allocation formula 

T- = cP+b- 
i ii i 

where c- and b- are provincial constants that do not depend 
on P- or Pj. 


If we were to adjust the census /counts for the present 
allocation model, the resulting inequity measure would be 
obtained from (8) by substituting Uj - Uj for (/■: 

iA= \ r i /p 3 m-TT+Sj 2 

. — it . — #/4-/7\ 2 

l?= [C 2 IP 5 (1-u+u) 2 ] T,P-Au r u r u-fiV 



Now if 



which should certainly hold, we get 

-l A - 

P 3 (1-Z7) 2 

2 Pj (Uj-u) 7 

Vj- 2 var 

V 4 "/V 

var U; 



Based on test 4, a conservative adjustment strategy might 
involve adjusting those provincial census counts only for 
which the left hand side of (28) is positive. 

In conclusion, two points may be noted. First, if the 
estimates f/.- are not based on sample data (such as in the 
case of analytic estimates), then S/=e.-, Var ^-=0, hence from 
(14) a condition for the adjusted counts to result in lower 
inequity is 

VPjG)- 2 2p / e / /> / = 


for which assumption A provides a sufficient (though clearly 
not necessary) condition. 

Second, the entire development of this section is valid 

C 2 v d /,. _rt _ <*L.r7\2 

P* (1-J7+J7) 

t-* 2 ZPjiUj-Bf-u+nr 

- , c ' h Pj (Uj - u) 2 - vpj ( U j - n.. - s-fS) 2 ] 

P 3 (1-t7) 2 L ' ' / / / J 


Proceeding in a way analogous to the previous section, we 

n _ 7Tj.r;\2 

A/ 2 = 2 Pj (Uj - u) 2 - S Pj (Uj - Uj - u+uf 
After some manipulation, we obtain 

A/ 2 = 2 Pj (e,- - e) 2 - 2 Pj (u f - u-e } +* ) 2 

-2T,Pj(bj-l)(Uj-% (31) 


As in the previous section, the last term is zero if the quan- 
tities 0- are unbiased estimates of u-. Failing that, we need 
an assumption analogous to assumption A. 

Assumption C: If the estimates 0- have a non-negligible 
bias, assume that 


Assumption C is satisfied if it is predominantly true that 
wherever the estimated underenumeration is above average, 
its measurement is also worse than average (b- < 5), and 
conversely. This is likely to be the case at least for most 
survey-derived estimates of underenumeration: people whom 
the census finds most difficult to count are, typically, also 
more difficult to enumerate in the evaluation survey. Note 
that assumption C is likely to be somewhat stronger than 
assumption A of the previous section, since the former is 
equivalent to 


where the right-hand side is very likely to be nonpositive; 
whereas assumption A only requires that the left-hand side 
be nonpositive. 

Under assumption C we readily obtain 

A l 2 > A l\ = 2 Pf {fij - ef - 2 Pf (Of -u-ef +e) 2 

= ^P,ef- 2 Pf (tij-ef) 2 -Pe 2 +P (u-e) 2 (32) 

All the quantities on the right-hand side of (32) are estimable 
unbiasedly, except P-. As before, if one accepts the biased 
estimates p- as serviceable (their role being that of a set of 
weights), we get our first test: 

Test 1 : Adjust the census counts if 

A?; = 2 Pf (uj - 2 var u.) - pu 2 +2p var u 

in _ £\2 

£, > 

= 2 Pf [ (Of - u) 2 - 2 var u { + 2 var u] = (33) 

A test analogous to test 2 of the previous section could 
immediately be deduced, but it would not be useful: In the 
present case, it is not valid to contemplate adjusting the 
census count of only some of the provinces, since the pay- 
ment to any province depends on the population of all 

The test above is designed to ensure that the census counts 
are adjusted if the adjusted counts are more likely to lead to 
a more equitable allocation of funds. As in the previous 
section, we can construct a test that will lead to an adjust- 
ment only if the adjusted counts are almost certain to lead 
to a more equitable allocation. In order to do so, we need 
to derive the variance A/ 2 - A/j , where 

A? 2 ' = 2 Pf [ (Of - ft) 2 - 2 var u. + 2 var u] 
It is easy to verify that 

A/2 - A% = 2 2 Pf [ef Of - Of + u 2 - eu 
+ var Uf- vart/J 


The following formula can be obtained after some algebra, 
which is correct to order 1/n 

Var (A/ 2 - A/a ) = 4 2 Pf \e) Var } + u 2 Var D 

- 2 e,- e Cov (u jf u)] + 2 P. P- e f e- 
• Cov (Of , Oj) - ej e Cov (u f , U) 

- eje Cov (Uj ,u) +e Var u] 

= 4Var[XPf(efUf-eu) ] 

= 4 Var [XPj(ej-e)Uj] 

= 4 2P / 2 (e / -e) 2 Vara,- 

+ 42^- Pj (ej - e) (e y - - e) Cov (Of , Oj ) 
m ' (35) 

The sign of the last term of (35) is not easy to guess. If 
the estimates 0- were based on simple random sampling, 
Cov (0.; 0-) would be negative and proportional to ee-. In 
that case, it is easy to verify that the last term of (35) is 
negative. In the case of more complex designs, this will still 
hold if the design effects applicable to the covariances are 
reasonably equal. In general, the negativity of this term 
cannot be proven although likely to be true. If the last term 
is negative, dropping it would lead to a conservative estimate 
of the variance. At any rate, for reasonable values of the 
sampling ratio, the covariances will be of the order of 1/P, 
thus neglecting them is analogous to dropping the finite 
population correction. Now, proceeding as in the previous 
section, we get the following test. 

Test 2: Adjust the census counts if 

Zpj [ (Oj - u) 2 - 2 var f + 2 var u] 

4 \/2pJ (Oj- u) 2 varOj> 


If test 2 is satisfied, then under assumption C and given 
moderate sampling ratios in all provinces, the odds are at 
least 0.975 that the adjusted counts result in less inequity 
than the unadjusted ones. 


Finally, we may note that if u- are not sample estimates, 
Uj = e- Var u- = 0, hence from (31 ) we get 

A l 2 = 2 Pj (uj - U ) 2 - 2 2 Pj bj {Uj - U) (37) 

A sufficient condition for (37) to be nonnegative is that 
assumption C holds. So, in this case, one should adjust the 
census counts whenever that assumption holds. 

As in the case of the fixed per capita payment model, 
much the same development applies under the slightly more 
general allocation model 

T i = C(P j lP)+b j 

where C and bj are constants that do not depend on P- or 


In the censuses of 1961, 1966, 1971, and 1976, the main 
vehicle used to measure the undercount has been the so-called 
reverse record check (RRC). The methodology of this ve- 
hicle, as used in the 1976 census, is briefly described below. 
Additional detail is available in references 1 and 4. 

For census purposes, persons are to be enumerated at 
their usual place of residence. There are, however, two 
groups that are included in the final census count but are not 
counted at their usual residence in Canada. The first group 
consists of diplomatic, military, and other personnel (e.g., 
merchant marine) living abroad. The second group corre- 
sponds to persons who were enumerated on a special form 
at a temporary address (hotels, motels, etc.) at the time of 
the census and who were missed at their usual residence. 

If a sample of persons could be drawn from sources in- 
dependent of the current census, and if the current address 
for each selected person were determined, one could directly 
search the current census file. Those who could not be found 
there would represent a sample of persons not enumerated 
at their regular address. The weighted total based on such 
a sample conceptually provides an estimate of the number 
of persons enumerated away from their regular residence, 
plus those missed by the census. Since the former count is 
independently known, the latter is readily estimated. 

The sample frame for the 1976 RRC comprised four 
nonoverlapping and exhaustive sources: 

(a) Persons enumerated at their regular residence in the 
1971 census (census frame); 

(b) Intercensal births (birth frame); 

(c) Intercensal immigrants (immigrant frame); and 

(d) Persons not enumerated at their regular residence in 
the 1971 census (missed frame). 

The "missed frame" in its totality exists only concep- 
tually. However, as explained above, the RRC of the previ- 

ous census resulted in a sample of such persons. The first 
time one carries out a RRC, frame (d) might not be avail- 
able; the second time around, it would fail to include persons 
missed in both of the two preceding censuses; generally, 
the nth time around, frame (d) fails to include only persons 
missed by each of the n preceding censuses. Since being 
missed is significantly affected by age, the proportion of 
persons missed by two or more consecutive censuses is prob- 
ably of rapidly diminishing significance. 

Having selected a sample of persons from the four sources, 
an extensive effort was mounted to locate their current 
address (tracing). The most dogged determination resulted in 
97 percent of selected persons being traced, including those 
who confirmed information that they died or emigrated 
since 1971. The failure rate of tracing, however, varied from 
frame to frame: 2.9 and 2.6 percent for the census and birth 
frames, respectively; 8.5 and 5.1 percent for the immigrant 
and missed frames. The untraced persons were eventually 
treated as nonresponse, in a fashion analogous to other 
sample surveys: within each frame an appropriate ratio ad- 
justment was used. It is important to note, however, that 
persons listed in frames for which tracing was more difficult 
had a higher rate of being missed by the census, leading to 
strong circumstantial evidence that those not traced were 
likely missed at an even higher rate. Thus it is very probable 
that, even after weighting for this "nonresponse" separately 
for the different frames, the RRC estimates have a higher 
bias (underestimating the underenumeration) whenever the 
underenumeration itself is higher (assumptions A and C). 

After completion of tracing, a thorough search of census 
records was carried out to determine whether the selected 
persons were enumerated at their respective traced addresses. 
In cases where the tracing indicated that the selected person 
died prior to the census, a search of the death register was 
carried out. All persons not found in either the census or 
the death register were further followed up. The objectives 
of the followup were dual : 

(a) To confirm that the traced address was correct, and/ 
or to obtain other addresses where the person may 
have been enumerated in the census; 

(b) To obtain at the same time a number of census char- 
acteristics for the persons concerned which, should 
they turn out to have been missed by the census, 
would enable us to provide basic tabulations on the 
characteristics of persons missed by the census. 

Persons who could not be traced in the followup opera- 
tion were added to the untraced (nonresponse) category. 
This raised the untraced proportion to 4.8 percent overall, 
but still left considerable differences as between the four 
frames: 4.0 percent in the census frame, 7.6 percent in the 
birth frame, 10.6 percent in the immigrant frame, and 9.6 
percent in the missed frame. Undoubtedly, the approach of 
following up all those traced persons who could not readily 
be found as enumerated in the census is correct, but it also 


contributes to the eventual estimates of underenumeration 
being conservative. 

Appendix 1 shows the number of selected persons in each 
final status category (enumerated, missed, deceased, emi- 
grated, and tracing failed) for each of the four frames. The 
counts are unweighted. The overall proportion of missed 
persons is higher than the final estimates for Canada pri- 
marily because of two reasons: the percent "missed" is 
actually the proportion of persons not enumerated at home 
(i.e., unadjusted for persons enumerated at a temporary 
residence); and the sample design used included over- 
sampling of young males in the census frame (where under- 
enumeration was expected to be high), so the unweighted 
proportion is an overestimate. Weighted and adjusted esti- 
mates of the undercount for various population and house- 
hold groups, are available in reference 4. 


Appendix 2 shows the basic results of the RRC by pro- 
vince, appendix 3, by age and sex. Breakdowns by other 
classifications are available in reference 4, including the pro- 
portion of households missed by their characteristics. The 
sample size used was slightly over 33,000 persons nationally, 
from the four frames combined. 

For purposes of allocating fixed per capita amounts to 
the 10 provinces, the overall tests 1 and 3 both indicate that 
adjustment is appropriate. This is also the case for test 2 
(the term-by-term version of test 1) for every province. Test 
4 is positive only for the four largest provinces (Ontario, 
Quebec, British Columbia, Alberta) and New Brunswick 
(because of the high value of u-). Test 1 is very "comfort- 
ably" satisfied —the sample could be reduced by a factor of 
36 and adjustment would still be indicated (i.e., even with 
an overall sample size of less than 1,000 distributed over 10 
provinces). Test 3 would be positive even with a sample size 
reduced by a factor of 4 (less than 8,000 selected persons 

In the case of the fixed total cost case, tests 1 and 2 are 
both positive. Test 2 would be positive even with a 60- 
percent reduction of the sample size actually used. Test 1 
would be positive with a sample size one-fifth as large as 
actually used, i.e., with about 6,600 persons. If one were to 
construct a term-by-term equivalent of tests 1 and 2 (as in 
the case of the fixed per capita allocation), the correspond- 
ing tests would be all positive in the case of test 1, except 
for New Brunswick (whose estimated underenumeration is 
very close to the national average); in the case of test 2, none 
of the term-by-term equivalents are positive. 

It is interesting to consider one of the major Canadian 
Federal-provincial transfer payment schemes. Skipping the 
detailed analysis of the legislation, it turns out that the 
relevant formulas to use are provided by the fixed total 
payment tests, but extending the summation over only seven 
provinces (Newfoundland, Prince Edward Island, Nova 

Scotia, New Brunswick, Quebec, Manitoba, Saskatchewan), 
while still retaining the national average 3. Applying tests 1 
and 2 of the fixed total-cost model in this fashion, they are 
still positive, but test 2 just barely so. An overall proportional 
reduction of the sample by even 1 percent would render 
the sign of test 2 negative. 


The question naturally arises how to allocate the sample 
designed to measure the undercounts u- so as to make the 
adjustments result in maximum precision. If the objective 
is to maximize the reduction in inequity after the adjust- 
ment, then one wants to maximize test 1 (in both the fixed 
per capita allocation and fixed total-cost cases). 

Let us assume, for the sake of simplicity, that the variance 
of U: decreases inversely proportionate with n-. Then denote 

~ "/ 

var u ; = — 

/ n, 

where v- includes all applicable design effects. In the case of 
the fixed total-cost allocation, we will neglect the effect of 
var u in formulas (33) and (36). (Its numerical impact is 
extremely small and most unlikely to influence optimal 
sample allocation.) 

It is now easy to verify that the optimal allocation (with 
fixed overall sample size n) that maximizes the value of test 
1 is 

/?• = n 



The same result applies in the case of both fixed per capita 
and fixed total-cost allocations. If the differences in the 
provincial design effects and in the proportions' u- are not 
large, a reasonable approximation of the above is to allocate 
the sample in proportion to the square roots of the provincial 
populations. This has merit, otherwise, as well as a reasonable 
compromise between designing the sample for obtaining the 
best national estimate of underenumeration (n- proportional 
to Pj) and obtaining equal reliability for every u- (n- equal 
for all /). 

If we want to maximize the chance of being able to adjust 
the census counts with security, we would want to allocate 
the sample to maximize the value of test 3 in the case of 
fixed per capita allocation and of test 2 in the case of fixed 
total-cost allocation. 

It can be verified that the following allocation would 
achieve this second objective: 


n i = n Xa 

a 2 = to 2 x 2 v- +p- v- 



and where, further, 

X: = U: in the fixed per capita allocation case 
= u—u in the fixed total-cost allocation case 

and t is the positive root of the equation below 


*2 _ 

2 2 

P; x ; v ; 
(2 ' ' ' ) (Sa 7 ) 


The right-hand side above involves t, but the equation can 
be solved iteratively, starting with an initial value of t=0. In 
fact, the convergence appears to be very rapid (the first com- 
puted value of t in several examples was correct to three 
significant digits). 

If t were equal to zero, the allocation of (39) would re- 
duce to that of (38). For large values of t (and if the values 
X: and V; do not vary much among the provinces), the allo- 
cation would be close to proportional to the provincial 
populations p-. 

The following example illustrates the differences between 
the two allocations. Let 

Xj = 0.01 (about the average value of \u-- u\ in the 

Canadian reverse record check; see appendix 

Vj = 0.02 (close to the average value of u- (1-t/-) 

in the reverse record check; appendix 1). 

Then f=7.14x10 -4 when 0=33,000 (the approximate re- 
verse record check sample size). Table 1 illustrates the pro- 
portion of the sample to be allocated to each of 10 provinces 
when n. is proportional, respectively, to yfpj, a., and p.. As 
expected, the allocation that is proportional to a • is in be- 
tween the allocations proportional top- and \/~pJ. 

Table 2 shows the realized values of tests 1 and 2 with 
the parameters as indicated above (but with the value of var 
u neglected in the case of test 2). 

As usual, the optima appear to be broad. The allocation 

Table 1. Alternative Sample Allocations 


Rev. rec. 



Newfoundland 0.057 

Prince Edward Island . 0.026 

Nova Scotia 0.070 

New Brunswick .... 0.063 

Quebec 0.191 

Ontario 0.220 

Manitoba 0.077 

Saskatchewan 0.073 

Alberta 0.104 

British Columbia . . . 0.120 































Table 2. Realized Values of Tests 1 and 2 Under Alternative 
Allocations (fixed total payment model) 

Rev. rec. 



Test 1 . 
Test 2. 





proportional to \fp~ maximizes test 1, while the allocation 
proportional to a- maximizes test 2. There is not much to 
choose among them. The actual reverse record check allo- 
cation was close to the \/p~ allocation, but adjusted for two 
considerations: (a) To provide acceptable estimates of under- 
enumeration for every province (thus the significant upward 
sample size adjustment for Prince Edward Island); and (b) 
to take into account the forecast values of u- and v- (thus 
also the significant upward adjustment in British Columbia, 
where both u- and v- were forecast and turned out to be 
significantly higher than elsewhere). 

In concluding this section, it is perhaps worth once again 
pointing out the somewhat curious phenomenon that the 
optimal sample allocation that maximizes test 1 (the esti- 
mated reduction in the inequity measures between the 
unadjusted and adjusted allocations) is different from the 
optimal sample allocation that maximizes test 2 (which, 
roughly speaking, ensures that the adjusted census counts 
actually decrease the inequity with a high probability). 


As emphasized in the introduction, the present paper does 
not purport to provide a tool for the definitive determina- 
tion of the answer to the question whether the census counts 
should be adjusted— even for purposes of fund allocation and 
even if the appropriate tests are positive. The tests deal only 
with the issue of equity, and then only for formulas that 
depend on census counts in a direct and untransformed 
fashion. This is not the case regarding the Canadian Fiscal 
Arrangements Act payments, except in census years; in all 
other years, the formula uses intercensal population esti- 
mates. It is yet to be determined whether suitable intercensal 
population estimates can be prepared starting with the 
adjusted census counts. 

Some legislated fiscal transfers depend on counts of a 
subgroup of the population (e.g., students, or children in 
low income families). While one of the advantages of the 
reverse record check methodology, as described in the 
previous section, is its ability to produce breakdowns of the 
number of persons missed by the census by their charac- 
teristics (as such it is different from, for example, analytic 
estimates of the underenumeration rate), the currently used 
sample sizes may not be adequate for adjusting the census 
counts for relatively small subgroups of the population. 

Intercensal population estimates are also used in the 
weighting of current household survey data through ratio 


estimates. These survey estimates may, in turn, be used to 
determine large transfers of funds. This is the case, for 
example, in the Canadian Labour Force Survey, whose esti- 
mates of the unemployment rate (in each of some 50 regions) 
determine the unemployment insurance benefits. Since un- 
employment is significantly higher in the category where 
the census undercount is also worst, i.e., young males, it is 
conceivable that if the ratio estimate could be based on 
adjusted intercensal population estimates, the benefits to be 
received by some Canadians could be affected. However, 
the appropriate adjustment depends not only on the method- 
ology of producing adjusted intercensal estimates but also 
on being able to adjust (even for the census year) by pro- 
vince and age-sex groups, which the current RRC sample 
size may not support. Research is currently underway to 
investigate the feasibility of producing synthetic adjustment 
factors from separate sets of estimates of underenumeration 
by province and by age-sex groups. 

Should adjusted census counts be used for one legislated 
use (such as the Federal Provincial Fiscal Arrangements), 
can the use of unadjusted counts be defended for other 
legislated uses (such as the determination of unemployment 
insurance benefits)? 

Another problem raised by Nisselson [3] is that, by ad- 
justing the census counts, we may remove an incentive that 
otherwise might motivate national associations of difficult- 
to-enumerate groups, such as certain minorities, to urge 
their members to cooperate with the census. 

The above is only a very partial list of policy issues that 
have to be addressed before a decision can be made on 
whether the census counts should be adjusted and, if so, 
for what purpose. One thing is certain, as Keyfitz [3] 
pointed out, the decision on the issue should be decided 
publicly and before the census is taken, so that the argu- 
ments for and against adjustment can be considered in the 
least charged atmosphere. 



Census f 


Birth frame 

Immigrant frame 

Missed frame 

All frames 







































































Total . . . 
Enumerated . 
Missed .... 
Deceased . . . 
Emigrated . . 
Tracing failed 


Undercoverage rate 

Number of persons 





1976 census 
adjusted for 

coverage 1 2 


tion 2 


1976 census 






number 1 


adjusted for 

undercoverage 1 



3 1.10 


3 6,200 






Prince Edward Island . 

3 0.38 


3 445 






3 0.86 


3 7,215 






New Brunswick .... 




























3 1.07 


3 1 1 ,080 



1 ,032,590 



3 1.33 


3 12,440 











1 ^38,035 

1 ,865,830 



British Columbia . . . 









All Ten Provinces . . . 









Note: Yukon and Northwest Territories excluded. 

1 The marginal totals or percentages may differ slightly from the sum of individual totals or percentages due to rounding. 

2 The standard error figure for the corresponding estimated number of missed persons also applies to these totals. 

3 Estimate with high relative standard error. 



undercoverage rate 

Number of persons 





1976 census 


adjusted for 

coverage 1 2 


tion 2 


1976 census 

Age and sex 





number 1 


adjusted for 

undercoverage 1 









1 1 ,703,200 






















3 1.93 


3 23 ,480 











1 ,062,345 










1 ,885,430 




3 2.33 


3 31, 300 







3 1.63 


3 20,310 







3 1.28 


3 12,000 






3 1.90 


3 16,925 


























3 1.26 


3 25 ,865 







3 2.05 


3 23 ,990 

























3 0.72 


3 9,335 



1 ,288,260 




3 0.81 


3 10,135 


1 ,244,735 

1 ,254,865 




3 0.58 


3 5,81 5 






65 and over 

3 0.64 


3 7,280 






Note: Yukon and Northwest Territories excluded. 

1 The marginal totals or percentages may differ slightly from the sum of individual totals or percentages due to rounding. 

2 The standard error figure for the corresponding estimated number of missed persons also applies to these totals. 

3 Estimate with high relative standard error. 


Gosselin, J. F., Theroux, G., and Lafrance, C. "Reverse 
Record Check— Evaluation Report." Mimeographed. Sta- 
tistics Canada, March 1979. 

Jabine, T. "Equity and the Allocation of Resources Based 
on Sample Data." Paper presented at Annual Meeting of 
American Statistical Association, 1976. 
Nisselson, H. "Comment on 'Information and Allocation: 

Two Uses of the 1980 Census' by Nathan Keyfitz." The 
American Statistician, (May 1979), 50-52. 

4. Theroux, G., and Gosselin, J. F. "Reverse Record Check- 
Basic Results on Population and Household Undercover- 
age in the 1976 Census." Mimeographed. Statistics Canada, 
March 1979. 

5. U.S. Department of Commerce, Office of Federal Sta- 
tistical Policy and Standards. "Report on Statistics for 
Allocation of Funds. Working Paper 1. March 1978. 

Implications of Equity and Accuracy for 

Undercount Adjustment: 

A Decision -Theoretic Approach 

Bruce Spencer 

Northwestern University 


The 1980 census will be subject to undercount, and esti- 
mates of the undercount will be prepared by one or more 
techniques. The presence of error in the population figures 
implies that various subgroups of the population may be 
partially denied societal benefits to which they are entitled. 
These benefits include equal political representation as well 
as monetary grants-in-aid under numerous Federal programs. 
For example, errors in the 1970 census counts for States 
could have caused one or more States to lose a seat in the 
U.S. House of Representatives [13, 14] . There are also over 
100 Federal programs that allocate funds at least partly on 
the basis of population data. The largest such program is gen- 
eral revenue sharing, which currently allocates nearly $7 
billion per year to State and sub-State governmental juris- 
dictions. Some programs, such as CETA (Comprehensive 
Employment and Training Act), use population data as 
thresholds across which areas must pass: to qualify as a prime 
sponsor under CETA, an area must comprise at least 100,000 

Given an estimate of the undercount, one may adjust the 
census count and then recalculate the allocations of House 
seats or general revenue sharing or other funds to govern- 
ments or other groups (4, 11-14, 16, 17] . The observation 
of differences between the adjusted allocations (i.e., the 
allocations based on data adjusted for estimated undercount) 
and the unadjusted allocations has given rise to perceptions 
of inequity and the desire by many to have the data adjusted 
for undercount. 

Unfortunately, the estimates of undercount are themselves 
subject to error. As a consequence of this error, the adjusted 
allocations may be further from the target 1 than are the 
unadjusted allocations (see the section below). Thus, adjust- 
ment may be inequitable for some participants in the alloca- 
tion process. If the adjustments are to be equitable overall, 
we must carefully consider both the accuracy of the esti- 
mates of undercount and what in fact we mean by equity 
in this context. 

This paper will first address considerations of accuracy 
and equity separately. We next consider how to make ad- 
justments that maximize equity, subject to the accuracy of 
the estimates of undercount and given criteria of equity. 
Illustrative calculations will be presented. 

1 Here, target refers to the allocations that would be calculated 
if perfect data were available. 

The purposes of the paper are to (1) identify the questions 
of what we mean by equity and to formulate equity concerns 
in useful ways; (2) heuristically consider, using simple ex- 
amples, what implications various concepts of equity have 
for undercount adjustment; and (3) identify a few questions 
for further consideration and research. Our considerations of 
equity will be restricted to the area of fund allocations based 
on census data. 


Estimates of undercount are based on data and on as- 
sumptions and thus they are subject to error. The potential 
significance of this error is illustrated by the following 

Example 1 

Suppose a fixed sum of money is to be allocated 
among States in proportion to total State population 
(for convenience we refer to the District of Columbia as 
a State). Then the fraction of the total going to each 
State equals the estimate of the ratio of State population 
to total population. Note that if all States had the same 
relative undercount (undercount expressed as a fraction of 
true population), these ratios would be unaffected. How- 
ever, it is estimated that States do not all have the same 
relative undercount; those whose relative undercount is 
less than the national relative undercount are overallo- 
cated funds, and those whose relative undercount is 
greater than the national relative undercount are under- 
allocated funds. If adjusted population figures are used, 
some States will gain money and some will lose, according 
to whether the percent increase in State population from 
adjustment is greater or less than the percent increase in 
national population from adjustment. 

The Census Bureau has used alternative methods to 
prepare estimates of undercount for States in 1970 
[14]. For each of these alternative methods, a set of 
adjusted population counts for States may be simply 
obtained by adding the estimate of undercount to the 
1970 census count for each State. It is interesting to 
compare the changes in allocation under two alternative 
adjustments, based on undercount estimates derived by 
the basic synthetic method and by composite-2 method. 
(See fig. 1 [14].) 



4 t- 




2 -- 



•• • 

x • 

2 -I- 

one State 

x two States 

1 • 



Composite— 2 method 

Figure 1. Percent changes in population-based distribution of a fixed sum of money 
among States when 1970 census counts are adjusted by the basic synthetic 
method and composite-2 method. Adapted from National Research Council, 
Counting the People in 1980: An Appraisal of Census Plans. Washington, D.C.: 
National Academy of Sciences, 1978, SPRR table Vlll-A. 


This scatter diagram illustrates two important points. 
First, the magnitudes of the changes in allocation can 
differ under alternative estimates of undercount. Here 
the changes under the composite-2 method (horizontal 
axis) are generally more extreme than those under the 
basic synthetic method (vertical axis). The second point 
to be appreciated is that the directions of the adjustments 
can differ under alternative estimates of undercount. Here 
the directions of adjustment under composite-2 method 
and under the basic synthetic method differed for 17 
States: Alaska, Arizona, California, Hawaii, Idaho, Illinois, 
Kentucky, Louisiana, Maine, Maryland, Michigan, Nevada, 
New Jersey, New Mexico, South Dakota, Vermont, and 
West Virginia. If we consider other estimates of under- 
count as well [14, tables Vll-D or F-1], we may note 
that the directions of adjustment also vary for Alabama, 
Colorado, and New York. 

In example 1, we do not know for which States which 
method gives more accurate estimates of undercount. A point 
to be appreciated is that whatever estimates of undercount 
are used to adjust, they will be subject to error, and there is 
a probability that the adjustments will (1) move some of the 
allocations farther from the target, and (2) result in an under- 
allocation to a State (or other group) that was previously 
not subject to underallocation. In order to assess these 
probabilities, it is necessary to have quantitative estimates of 
the accuracy of the estimates of undercount. 

Estimates of accuracy may be completely specified as 
probability distributions for the estimates of undercount 
(or equivalently, for the alternative population estimates, 
where these are equal to the sum of the population count 
and the estimated undercount). For example, we might 
believe that the estimate of relative undercount of an area 
provided by a certain method had a normal distribution with 
mean and standard deviation .01. Deriving these individual 
(i.e., marginal) probability distributions is a challenging task. 
Deriving a joint probability distribution will be even more 

Different techniques will be needed to estimate the 
marginal distributions for estimates provided by different 
methods. For example, standard statistical theory may be 
used to estimate the variances of dual systems estimates [1] , 
but estimates of biases (e.g., response correlation bias and 
matching bias; see [9]) which may dominate the variances re- 
quire other approaches. To estimate the distributions for the 
demographic method estimates requires different techniques. 
The demographic method uses complicated models and 
diverse data sources to construct estimates of undercount. 
Some of the parameters in the models represent guesses, and 
the data in the models are used in complex ways. A possible 
but untried approach to estimate the distributions of the 
demographic estimates is to write out the models completely 
and explicitly, specify probability distributions for para- 
meters and data used in the models, and use the delta 
method to produce estimates of bias and variance. 


Because the estimates of undercount are subject to error, 
it is likely that the adjustment will improve the accuracy of 
the allocations to some parties but will decrease the accuracy 
of allocations to other parties. That is, adjustment will move 
some allocations closer to those that would occur with 
perfect data but will move other allocations further away. In 
addition, adjustment will cause underallocations to some 
parties. If the adjustments are in fact to be equitable overall, 
we need to carefully consider what we mean by equity. 
Equity is generally taken to refer to the "spirit and the habit 
of fairness, justness, and right dealing" [2] . This may be 
interpreted in various ways, and for the present purposes we 
need to settle on one explicit interpretation. Since there is 
no unique interpretation of equity, it should be considered 
as a convention, i.e., equity is what we agree it is for the 
practical purposes of determining undercount adjustment. 
Convention is used here in the sense of Keyfitz [7] . 

Agreeing on a convention could involve much political 
debate. However, once a convention of equity is agreed 
upon, it becomes a technical matter to determine how best 
to achieve the agreed-upon measure of equity. The statistical 
operations would be insulated from the political give-and- 
take, with two results: (1) Less chance of politicization of 
statistical agencies, and (2) more efficient achievement of 
policy goals (political debate would be freed from statistical 
encumbrances, so political actors can focus on specifying 
desires rather than statistical methods). 

Here we will consider equity as it pertains to two aspects 
of the adjustment of census counts and allocations. First we 
consider the equity of the end product of the census-taking 
and adjustment processes as a whole, that is, the equity of 
the allocations. Next, we consider equity as it applies to ad- 
justment as a process in itself, separate from the census- 
taking operation. In both cases, we need to formulate ex- 
plicit and precise statements of what we mean, operationally, 
by equity. In particular, as we shall see in section 3, the 
convention of equity that is adopted will influence our 
choice of adjusted figures. 

Equity of Allocations 

To consider the equity of the allocations, we will adopt 
the perspective of decision theory and construct measures of 
equity to reflect our preferences over alternative sets of 
adjusted figures. Thus, considerations will not be limited to 
narrow interpretations of equity as fairness, but include 
social welfare concerns as well. In particular, we will consider 
the formulation of rules that rank different sets of adjusted 
figures, or allocations, according to our notions of equity. 
For concreteness, we consider the allocation of a fixed sum 
among several parties. The allocations will usually beassumed 
proportional to population size. Our concern here is to com- 
pare the equity of different allocations. 


We take as a starting point the assumption that when a 
set of allocations is identical to the set of target allocations, 
there is maximum equity. We also refer to this as zero in- 
equity. It is important to assume that the target values 
coincide with those allocations that would occur if there 
were no error in the data. Of course, the actual allocation 
formulas are themselves imperfect, and it is probably argu- 
able in individual cases that greater social good can occur 
under some perturbation of the target values defined. How- 
ever, these concerns should not be brought into the context 
of adjusting census counts because of data inaccuracy— they 
pertain to the allocation formulas rather than to the data 
used. For further discussion of statistical equity (arising from 
good data) and political equity (arising from good formulas), 
the reader is referred to the paper by Robert Hill, also in this 

Any possible deviations of allocations from the target 
values (as defined above) imply less than maximum equity, 
or a degree of inequality. Given two alternative sets of allo- 
cations, both imperfect, it is important to be able to com- 
pare their levels of inequity. 

Example 2 

Suppose for simplicity that there are three parties par- 
ticipating in the population-based allocation of $100 
million. Assume also that each party has the same popu- 
lation size (hence the same target value) and that the 
three parties are labeled 1, 2, and 3. 

Consider two alternative sets of allocations with the 
following properties: 

Error in allocation 3 
(millions of dollars) 

Error in allocation 1 
(millions of dollars) 

Party 1 


Party 2 

+ 2.0 

Party 3 

+ 2.0 

Error in allocation 2 
(millions of dollars) 

+ 1.0 
+ 1.0 

To decide which allocation is more equitable, it is neces- 
sary to have some criterion for comparison. Notice that 
under allocation 2 each component misallocation (i.e., 
the error in allocation to each party) is smaller than under 
allocation 1. Allocation 2 is closer than allocation 1 in 
every way to the target values, and it ought therefore to 
be considered more equitable. 

Other situations invite more disagreement about which 
set of allocations is more equitable. Consider allocations 
3 and 4 below: 

Error in allocation 3 
(millions of dollars) 


• in allocation 4 
ions of dollars) 

Party 1 
Party 2 
Party 3 
Sum of misal loca- 


- .8 

- .8 



tions (sign 



Sum of squared 



Total underallo- 



Sum of squared 



Largest under- 

al location 


Error in allocation 4 
(millions of dollars) 


If our concerns for equity center on the total amount of 
money that is misallocated or on the total amount under- 
allocated, then allocation 4 is preferable. The total mis- 
allocated under allocation 3 is $3.2 million, while the 
total for allocation 4 is $3.0 million. The total under- 
allocations are $1 .6 million and $1 .5 million, respectively. 
Hence, allocation 4 is more equitable. Alternatively, we 
might think that large errors are disproportionately worse 
than small, so that the -$1.5 million error to party 1 
under allocation 4 is more inequitable than the combined 
effect of the two -$0.8 million errors to parties 2 and 3 
under allocation 3. 

Loss Functions 

In adjusting census counts and allocations, it is not possi- 
ble to directly consider the comparative inequities of all 
likely sets of allocations. For one thing, there are too many 
allocations possible. Also, we have not yet specified precisely 
how we want to evaluate equity so that we can make the 
comparisons. A solution to this dilemma is to utilize explicit 
loss functions to represent an agreed-upon conception of 
equity. Loss functions are constructs that concisely and 
tractably represent one's preferences. In the present context, 
a loss function will assign a number to each set of allocations 
in such a way that whenever a set of allocations A is more 
equitable (under an agreed-upon conception of equity) than 
another set of allocations B, the loss function assigns a 
greater number to B than to A. (To avoid possible confusion, 
it should be noted that since different loss functions repre- 
sent different preferences, it is not meaningful to compare 
the values of different loss functions. Only the values of the 
same loss function over alternative allocations are com- 

Example 3 

Consider once again allocations 3 and 4 as described 
in example 2. If equity is to be judged on the basis of the 
total amount of money misallocated, then we may define 
the loss function to be proportional to the sum of the 
absolute values of the errors in allocation. Thus the nu- 
merical value taken by the loss function is 3.2 for alloca- 
tion 3 and 3.0 for allocation 4. (The proportionality 
constant does not matter.) The loss function reliably 
represents our preferences, telling us that allocation 4 is 


more equitable than allocation 3 under this conception 
of equity. Applying the same loss function to allocations 
1 and 2 of example 2, we observe that the value of the 
loss function is 8.0 for allocation 1 and 4.0 for allocation 
2, so that allocation 2 is more equitable than allocation 1 . 
The loss function also tells us that allocations 3 and 4 
are both more equitable than allocations 1 or 2, because 
the value of the loss function is smaller for allocations 3 
and 4 (3.2 and 3.0) than for allocations 1 and 2 (8.0 and 

An alternative equity concept reflects disproportion- 
ately greater concern with large errors than small; recall 
that under this concept allocation 3 might be judged more 
equitable than allocation 4. A way to react to this concern 
about large errors is to consider the square of the errors 
rather than their magnitudes. If we consider overalloca- 
tions and underallocations to be equally inequitable, we 
can consider the loss function defined to be the sum of 
the squares of the errors in allocation. This loss function 
takes the value 3.84 for allocation 3 and 4.5 for alloca- 
tion 4, so allocation 3 is more equitable. 

Other equity criteria treat overallocations and under- 
allocations differently. For example, the inequity of an 
overallocation to a party might be considered negligible, 
since the partyis receiving a net gain in allocation and thus 
suffers no harm. If, simultaneously, there is extreme 
concern with large underallocations, then one might com- 
pare the equity of alternative allocations on the basis of 
the sum of squares of the underallocations. The value of 
this loss function is 1.3 for allocation 3 and 2.2 for 
allocation 4, so allocation 3 is more equitable under this 
loss function also. 

Example 3 has illustrated how loss functions can be used 
to represent different concepts of equity of allocations. The 
question of what concept of equity, or equivalently, what 
loss function should be used has not yet been addressed. 
Therefore, several more complex concepts of equity will 
be considered below. 

Most methods for deriving loss functions to measure 
equity or, equivalently, for comparing the inequities of two 
sets of allocations proceed in two steps [3, 5, 15, 16] . First, 
the inequity to each party in the allocation process is con- 
sidered. Then, from these individual considerations, the over- 
all equity is evaluated. 

In considering loss functions to represent more complex 
concepts of accuracy, we will focus primarily on how to 
evaluate the individual inequities to the different parties in 
the allocation program. The measure of overall equity will 
be taken here as the sum of the individual inequities. (Other 
formulations are possible as well but will not be used here.) 2 

In particular, we will consider (1) ways to treat inequity 
from overallocation and underal location differently, (2) 
differential treatment of different parties, and (3) how to 
consider equity for subgroups (e.g., racial or ethnic groups) 
that do not receive allocations directly, but share in alloca- 
tions to the political jurisdictions in which they reside. 

Overallocation versus Underallocation 

It is clear that a positive inequity is associated with a 
party that is underallocated funds because of data error. 
But what of a party that is overallocated funds? When a 
fixed sum is being allocated, an overallocation to one party 
implies underallocation to another, and thus overallocations 
contribute indirectly to the overall inequity. However, since 
the inequity from underallocation is reflected in the measures 
of inequity for those parties subject to underallocation, the 
inequity from overallocation should be considered separately. 
Since overallocation to a party produces a net gain for that 
party, it is conceivable that an individual overallocation is 
associated with negative disutility. 

Example 4 

To illustrate the difference in individual inequities 
arising from overallocation and underallocation, consider 
a population-based allocation of a fixed sum to two 
parties, where it is not known who is party 1 and who is 
party 2. Suppose that, despite adjustment of data, the best 
allocation possible would be overallocating $1 million to 
party 1 and underallocating $1 million to party 2. Now 
suppose that by spending an extra $1.8 million on data, 
the shares of the fixed sum to be allocated could be de- 
termined perfectly, but this data expenditure would be 
taken from the fixed sum to be allocated. Then both 
parties would be underallocated (relative to that original 
fixed sum) $0.9 million. 

Party 1 
Party 2 

Error in allocation 5 
(with existing data) 
(million of dollars) 


Error under allocation 6 

(with extra data) 

(million of dollars) 

- .9 

2 In effect, we are adopting a utilitarian approach. Loss functions 
based solely on the largest misallocation or largest underallocation do 
not fall in the class of loss functions we consider, but they can be 
approximated by loss functions we do consider. 

Which allocation is more equitable? It depends upon 
how the inequities from overallocation are perceived. 
Suppose we represent the overall inequity by the sum of 
the absolute values of the underallocations, plus a con- 
stant C, times the sum of the overallocations: 

Sum of underallocations + C x sum of overallocations 

(Similar examples can be constructed for squared error or 
other loss functions.) Then the inequity for allocation 5 
is measured at 1 + C and the inequity for allocation 6 is 


measured at 1.8. In this class of loss functions, allocation 
5 is judged to be more equitable than allocation 6 only if 
C is less than 0.8. 

To push the example further, suppose the extra data 
would have cost only $1 million rather than $1 .8 million, 
so that allocation 7 would have occurred. 

Party 1 
Party 2 

Error under 

allocation 5 

(million of dollars) 


Error under 

allocation 7 

(million of dollars) 

- .5 

Now allocation 5 is more equitable than allocation 7 
(under the loss function above) only if C is negative; i.e., 
only if overallocations are treated as negative inequities. 
If we are indifferent between the two allocations, then C 
is zero, and if allocation 7 is preferred, then C is positive. 

Example 4 suggests that we may want to treat overalloca- 
tions and underallocations asymmetrically. In particular, 
we may want to treat overallocations to individual parties as 
implying individual benefits. This consideration has in- 
teresting implications for the form of the measures of in- 
dividual inequities. For example, consider the loss function 
defined to be proportional to the sum of the squares of the 
underallocations, plus a negative constant C, times the sum 
of the squares of the overallocations: 

Sum of squared underallocations 
+ C x sum of squared overallocations C < 

Such a loss function may violate one of our basic stipulations 
about our concept of overall equity, viz, perfect equity 
occurs when the set of allocations equals its set of target 
values and otherwise a degree of inequity occurs. The loss 
function just proposed takes the value zero when there are 
no errors in allocation. But the loss function can take nega- 
tive values for some possible configurations of errors in allo- 
cations, implying that these configurations are more equit- 
able than that of no errors in allocations. For example, if 
C= -0.6, then the value of this loss function for allocation 
3 is-0.256 (=1.28- .6x2.56). Loss functions taking negative 
values are not permissible because they do not faithfully 
reflect our concept of statistical equity. Statistical and 
political concerns need to be kept separate. It would be a 
mistake to try to use statistical adjustments to compensate 
for perceived shortcomings in the allocation. As the example 
shows, care is needed in formulating loss functions to rep- 
resent concepts of equity. 

We note in passing that in a true fixed-pie allocation, the 
asymmetric losses to individuals may combine to yield a 
symmetric aggregate loss function. Thus, if the loss function 
is taken to equal the sum of underallocations, plus a constant 
C, times the sum of the overallocations, then the loss func- 

tion may be reexpressed as (1 + C)/2, times the sum of the 
absolute errors in allocation. (To see this, observe that in a 
fixed-pie allocation, the sum of underallocations and the sum 
of overallocations are equal to each other, hence equal to 
half the sum of the absolute errors in allocation.) 

Differential Treatment of Different Parties 

In practice there may be a strong desire to treat different 
groups differently. For example, an underallocation of a 
given dollar amount may be judged more inequitable for a 
State with a small population, such as Vermont, than for a 
State with a large population, such as California. One way to 
accommodate differential treatment is to weight the indi- 
vidual inequities by different factors. That is, we may con- 
sider loss functions defined to be weighted sums of the 
individual inequities. 

Example 5 

Let the subscript / refer to a party (e.g., a State or local 
government) in the allocation process, let x- denote the 
error in the allocation to party /', and let w- denote a 
weight for party /. First, consider the simple loss function 
defined by: 

where 2 is read "the sum over all party / and| x- \ is the 

/ ' 

absolute value of the error in allocation x-. To treat differ- 
ent parties differentially, we may consider the loss func- 

2 w | x- 1 

/ ' 


For example, w- might be taken inversely proportional to 
the population of unit /. This reflects concern with per 
capita errors in allocation. Or, w ; might be taken inversely 
proportional to the target value for party /. This reflects 
concern with proportional errors in allocation. 

We may wish to use different weights depending on 
whether the error in allocation is positive or negative. Let 
/ + = the set of units with overallocations, /"" = the set of 
parties with underallocations, and let w- + and wr be two 
weights for party /'. The numbers w- + and wr, respec- 
tively, weight overallocations and underallocations to 
party /. We require the weights wr to be postive because 
underallocations imply positive inequity. We may consider 
the loss function given by: 

S W + x, + 2_ W. | X; | 


where 2, (or 2_) is read "the sum over all units /with 
/ + / 


overallocations (or underallocations)." 3 For example, one 
might choose w- + inversely to population (to reflect con- 
cern about per capita overallocation), and one might 
choose w,~ inversely to the product of population and per 
capita income, to reflect concern that per capita under- 
allocations are more severely felt by parties with lower 
per capita income. 

As discussed in the paragraph following example 4, 
we may want to choose w- + to be negative. In this case, 
however, our choice of weights is constrained. For ex- 
ample, suppose that for all units /' the ratio of w- + to w~ 
is a constant C lying between and -1. Then, if the loss 
is to take its minimum, zero, only when there are no 
errors in allocation, it is necessary 4 that the ratio of the 
smallest w~ to the largest wj~ be greater than the abso- 
lute value of C. This is a potentially significant constraint. 
It shows that the following loss function is not permissible 
for measuring the inequity of errors in allocations to 

Sum of per capita underallocations + C x sum of per 
capita overallocations 

group in the State. If P. is the true population of sub- 
group s in State /, P + is the true total population over all 
States of subgroup s, P-. is the true total population in 
State /, and P | i is the true total national population, 
then the fraction of the total national allocation that 
should go to State / is P- + /P ++ . The fraction of the total 
national allocation that should go to subgroup s in State 
/' is 

<W r>J p - 


/+" ++ 

The fraction of the total national allocation that should 
go to subgroup s is obtained by summing P-JP ++ over all 
States / to yield 


+ / P 

Letting s refer to blacks, using estimates of 7.7 percent 
undercount for blacks and 2.5 percent undercount nation- 
wide, the fraction of the total national allocation for 
blacks as a group is estimated to be too low by 5.3 per- 
cent, or 

where C is less than -0.04 (note that the ratio— the re- 
ciprocal of the largest State population to the reciprocal 
of the smallest— is less than 0.04). In formulating loss 
functions, constraints such as this need to be considered. 

Population Subgroups 

The recipients of allocations under Federal grants-in-aid 
programs are typically State and local governments. But con- 
cern for equity also extends to racial, ethnic, and other 
groups who may not receive allocations directly, but who 
share in the allocations to political jurisdictions. Thus, for 
example, the disparity between the estimates of undercount 
for blacks (7.7 percent) and whites (1.9 percent) suggests 
that, as a group, blacks have been underallocated funds 
under programs that use population data. 

Example 6 

Consider an allocation program that allocates a fixed 
sum of money to States in proportion to total population. 
Assume that every subgroup in each State shares in the 
State's allocation proportionally to the size of the sub- 




.053 P 


3 This may be reexpressed as 2 iw/ + x, + + wf xf ), where x. + 
= max (xj, 0) and xf = max (-*/, 0). 

4 A simple example illustrates this. Suppose a fixed amount is to 
be allocated to two parties, indexed by 1 and 2, such that party 1 is 
overal located by a positive amount X and party 2 is underallocated by 
X. Let the weights be given by w l - = 1 , iv, + = C, w 2 '- = A, and w 7 + 
= AC. Suppose that A is less than 1 , so that w 2 - is smaller than iv, -. 
The value taken by the loss function is clearly CX + AX, which is 
greater than zero only if A, the ratio of w 2 ~ to w t ~, exceeds the 
absolute value of C. Further discussion is found in [15]. 



Generally, the error x in allocation to subgroup s arising 


from errors x ■ in allocations to States / is 

x =2 Jlx. 
s i P. ' 

where P- is the population of subgroup s in State / and P.. 
is the total population of State /'. Measures of inequity for 
individual subgroups may be formulated analogously to the 
measures for other individual parties described earlier. Thus, 
possible measures of inequity for a subgroup s include 
w \x | and w (x ) 2 , where w is a positive weight. Another 

So S S o 

possible measure is: 

w x ifx is an overallocation, and 

m/ _ |x| ifx is an underallocation 
s s s 

where w. is a (possibly negative) weight and w~ is a posi- 
tive weight. 

An important question is how to combine the measures 
of inequity for individual subgroups into a measure of total 
inequity. For example, considering the measure of individual 
inequity to have the general form w\x\, is it desirable to com- 
bine subgroup inequities into a measure of "overall inequity 

for subgroups," say, 2w |x I? Furthermore, how do we want 

s s s 
to combine the measures of inequity to subgroups and the 

measures of inequity to parties (e.g., State governments) 


participating directly in the allocation program? Do we want 
to add them; for example, do we want to use 

Siv-k/l + 2w |x | 
,11 s s s 

If not, how should we simultaneously minimize both meas- 
ures of overall inequity 2m/.|x.| and 2w |x | ? That is, what 

/ ' ' s s s 

tradeoffs do we prefer between inequities to subgroups s and 

parties / that directly participate in the allocation program? 
In any case, what subgroups are to be considered? 

(We note, however, that the situation simplifies 
if each subgroup s is contained wholly within one State (or 
party) / so that the subgroup may be denoted s(i). For ex- 
ample, the black population of each State could be con- 
sidered separately rather than as a national group. In this 
case, the error in allocation to subgroups s(i) is 


x s(i) -p— X i 
r /+ 

and if the inequity to subgroup s(i) is w ... \ x ... | , then the 
total inequity to States (or parties generally) and to sub- 
groups can be written as 

' * r i+ 

or, more simply, 2 w } * \x .\, where w* = w- + 2 w ,-, — -• 
j I I i i $ SUJ p. + 

This is of the general form of display (1) in the preceding 

Equity of the Adjustment Process 

So far, we have been considering the equity in the final 
allocations. Now we consider the equity of the adjustment 
process as a separate operation. Because the estimates of 
undercount are subject to error, there is a probability that 
the adjustment will (1) cause some parties to be underallo- 
cated whereas they otherwise would not be and (2) increase 
the severity of the underallocation to other parties. 

Situation (1) could occur if a State whose relative under- 
count was equal to or less than that of the Nation as a whole 
was estimated to have an even smaller relative undercount 
(compared to the Nation as a whole). In this case, the adjust- 
ment could greatly reduce the State's allocation. For a State 
in this situation, the adjustment would cause an underalloca- 

Situation (2) could occur if a State whose relative under- 
count exceeded that for the Nation as a whole was estimated 
to have a relative undercount equal to or less than that for 
the Nation as a whole. In this case, adjustment would in- 
crease the underallocation to the State. 

Given the quality of estimates of undercount that we 
are likely to obtain, we can expect situations (1) and (2) 
to arise in any adjustment process with some probability. 
Concern over the equity of the adjustment process might 
lead us to try to reduce the probability of (1) or (2) for any 
given party to below a specified level. Here equity consider- 
ations serve as constraints on the adjustment process. In 
contrast, equity considerations that focused just on the out- 
comes of the allocations after adjustment could be repre- 
sented as loss functions. The two kinds of equity may not 
be compatible, and tradeoffs may need to be considered. 

Formally, equity concerns about the adjustment process 
can also be represented by loss functions, and the several 
kinds of equity we have been discussing can be jointly repre- 
sented by a multivariate loss function. However, the choice 
of a minimization criterion for such a loss function is equi- 
valent to the choice of a tradeoff among the alternative 
measures of inequity. An extensive discussion of theory for 
evaluating tradeoffs in this kind of situation is given in [6] . 


Having chosen a criterion of equity and specified a loss 
function to represent our notions of equity, we may char- 
acterize the most equitable, or the optimal, set of population 
estimates as that which minimizes the expected value of the 
loss function. 5 The difference between the optimal popula- 
tion estimate for an area and the census count is the optimal 
adjustment for the area. The values of the optimal adjust- 
ments are affected by both the loss function used (i.e., the 
chosen criteria of equity and preferred tradeoffs among 
them) and the probability distribution according to which 
the loss function's expected value (or expected loss) is 

The questions we will consider include the following: 

1. How sensitive are the optimal adjustments to different 
notions of equity of allocations? 

2. How sensitive are the optimal adjustments to the 
accuracy of the estimates of undercount, or equiva- 
lent^, to the probability distribution with respect to 
which expected loss is computed? 

3. How do considerations of equity of the adjustment 
process affect the choice of optimal adjustments? In 
particular, how do the probabilities of causing an 
underallocation or increasing the severity of an under- 

5 As stated earlier, if the loss function is vector-valued, the criter- 
ion of minimization may be complicated, and must be formulated to 
reflect our preference over tradeoffs among different notions of 

We have not tried to take uncertainty considerations into account 
in formulating the loss function, so the loss function may not strictly 
be interpreted as the negative of a Von Neumann-Morgenstern utility 
function. Next to the other problems with developing appropriate 
loss functions, this one is surely minor. 


allocation vary 

with the accuracy of the estimates of 

A few preliminary words about limitations of the sensi- 
tivity analysis are in order. These limitations pertain to the 
formulation of the probability distribution with respect to 
which the expected value of the loss function is computed. 
Determination of this probability distribution involves assess- 
ment of the accuracy of the undercount estimates, or equiva- 
lent^, of the alternative population estimates. As discussed 
in section 1 , there are many sources of uncertainty in these 
estimates and to some extent formulation of the probability 
distributions will necessarily depend on the subjective be- 
liefs of experts. Further consideration of the formulation of 
these probability distributions is beyond the scope of this 

The most logical approach to the problem of finding 
the optimal adjustments is that of Bayesian decision theory 
[10], so that his probability distribution represents the 
judgment of experts about the probable values of true 
population sizes, given all available data. For the present 
purposes of illustration, the Bayesian approach is too 
complicated. Instead, we will consider the sensitivity of the 
optimal adjustments under a somewhat ad hoc optimiza- 
tion approach to a highly simplified model. This model 
will be easy to work with and the sensitivity of the optimal 
adjustments derived from it will provide at least tentative 
insight into questions 1 through 3 above. 

Accuracy and Equity of Allocations 

We use the following model: Let 6 ■ = P/P. be the true 
fraction of total population belonging to party /, and con- 
sider two estimates of 9 ., that derived from the census 
counts, C-, and that derived from alternative methods (e.g., 
demographic analysis or synthetic methods), E.. We assume 
that E- is normally distributed with mean 0. and variance 
V 2 . The errors in the census shares are modeled as fixed 
biases with negligible variances (and negligible covariances 
with Ej), so that the expectation of C is ; - — U- and the 
variance of C- is small enough to ignore. We interpret U ■ 
as the differential undercount. To optimally estimate 6 ■, 
we consider choosing the weighted average of C- and E. 
that minimizes the expected value of the loss function. 

First, we consider the weighted square-error loss. Writing 
the weighted average as V- = fC. + (1 - f.) E. we want to 
choose fj to minimize the expectation of Zw. (Y. - 6 J 2 . 
If the parties are States, so that 2 6 • is known to be 1, we 
may wish to impose the constraint 6 that 2 Y . = 1 . Under the 
above assumptions, the desired values of f ■ are given by 


V 2 + u? 

X(C r Ejilw. 

v 2 + u? 


Illustrative calculations are discussed below. To use the 
optimal weights f*, estimates of U- and V- are needed. To 
estimate V-, we would ideally use the difficult techniques 
discussed much earlier; for the calculations presented below 
a range of values from .05 U- to 5.0 U- were considered. To 
estimate U '• we will use the estimates of differential under- 
count given by U- = E. - C. Thus, in practice (for this ex- 
ample), the weights f. are random. For the calculations, 
three sets of E '. for States were used, corresponding to the 
estimates in SPRR (table f-1, cols. 1 and 13, and table VI l-D, 
col. 7) denoted "SOR-3-1 WCF-1 BACF-1," "basic synthetic 
(age, race, sex)," and "composite-2." Four sets of weights 
were used : w. = 1 1C-, w.= 1 , w. = C, and w. = C 2 . 

The first point worth noting is that the values of f* were 
determined by the first of the two terms on the right-hand 
side of equation (5). The term involving X was always smaller 
than 10~ 6 , whereas f* ranged from 0.0025 to 0.96. Thus, 
the optimal values of f* essentially are given by the first 

term on the right of equation (5). 
we may reexpress (5) as 

Ignoring the second term, 




1 + V; IU. 


As the relative variance of the estimate of differential under- 
count, V 2 /U- 2 , increases, the weight 1 - f* decreases. 
Some values of 1 - f* and VJU-are shown below. 

V/Uj =0.05 0.1 0.4 1.0 1.4 2.0 3.0 4.0 5.0 

1 - f* = 0.998 0.99 0.86 0.50 0.34 0.20 0.10 0.06 0.04 

Thus, the believed accuracy of the undercount estimates has 
a great effect upon the choice of optimal adjustments. 8 
Notice, however, that even when the available estimates of 
undercount are highly inaccurate (V-/U- at least 3.0), the 
census figures are still adjusted somewhat. 

6 The Bayesian approach would not require this constraint to be 
separately imposed. 

-1 + 2 ( V? Cj + U/ 2 Ej) I ( V 2 + Uj 2 ) 

7 Choose \ 

£ IC.-Ej) 2 l[wi(Vj 2 +U/ 2 )] 

where the constant X is chosen 7 so that 2 Y . = 1 . 

Note that the weights fj* as given by (5) are really random, de- 
pending on C. and E .. One alternative way to constrain S Y\ = 1 is to 
choose f* , disregarding the constraint— in this case fj* is given by the 
first term on the right-hand side of (5)— and then to divide each Y\ by 
£V/. This is intuitively less appealing than (5). For example, if E; 
and Cj are equal, then using fj* from (5) we have // = Cj = Ej, but if 
we simply divide each // by £ //, then V; ¥= Cj - Ej. 

"This conclusion, of course, depends on the models used above and 
on the optimization approach adopted. A relationship between the 
Bayesian approach and the present approach is easily seen in the uni- 
variate case. Under the approach above, the optimal weight was f* 
= V 2 KV 2 + U 2 ). To consider a Bayesian approach, let E (condi- 
tionally on C) be normally distributed with mean 6 and variance 
y 2 , let C be normally distributed with mean 6 and variance U 2 , 
and let be distributed with large variance compared to U 2 and 
V 2 . Then the posterior mean for is f*C + (1 - f*)U where f* = V 2 1 
(V 2 + U 2 ), the same as the optimal Y derived above. 


The fact that the second term on the right-hand side of 
equation (5) was negligible implies that the weights w- 
in the loss function are unimportant for determining the op- 
timal adjustments. This has implications for the equitable 
treatment of subgroups of the population, as previously 
discussed. If subgroups of the population living in different 
States are treated as different subgroups, then concern about 
equity of allocations to these subgroups can be reflected by 
modifying the weights w- for the States in which they re- 
side. (See parenthetical note at end of the Population Sub- 
groups). However, the insensitivity of the optimal weights 
f* to variation in the weights w- indicates that equity in 
allocations to these subgroups can best be achieved by maxi- 
mizing the equity in the allocations to the States, that is, by 
choosing adjustments to minimize the loss function for errors 
in allocations to States. 

If subgroups of the population living in different States 
are not treated separately but instead are treated as aggre- 
gates, the situation is more complex, but I conjecture that a 
similar result holds. That is, the optimal adjustments derived 
by considering equity in allocations to States and to sub- 
groups will be identical to those derived by considering 
equity in allocations to States alone. Of course, this con- 
clusion needs to be tested under other models as well. 

Next, we consider the sensitivity of the optimal allocations 
to changes in the form of the loss function. An alternative 
to the weighted squared-error loss function is the weighted 
absolute-error loss function 2w- I Y. - 6 X \ . The optimal 
weights f* under this loss function do not have tidy ex- 
pressions 9 like those for squared-error loss (5), but they 
too depend only on V?/U. 2 (if the constraint that 2V.=1 is 
not imposed). Some values are shown below. 


f* squared error 
f* absolute error 

2.0 1.4 1.0 0.7 0.4 0.3 0.25 

0.15 0.1 

.80 .67 .50 .34 .15 .09 .06 

.02 .01 

.70 .59 .44 .29 .14 .08 .05 

.02 .01 

.10 .08 .06 .05 .01 .01 .01 

.00 .00 

magnitude of the optimal adjustment. This suggests that a 
convention for equity in allocations can be agreed upon. 
On the other hand, the optimal adjustments are sensitive 
to the believed accuracy of the estimates of undercount. 
Obtaining estimates of accuracy is difficult and may give 
rise to controversy among statisticians. 1 ° 

Accuracy and Equity of the Adjustment Process 

Under the models considered so far, we do not fully ad- 
just for estimated undercount but adjust by a fraction 1 - f*. 
It is important to realize that this does not necessarily under- 
adjust for areas whose undercount is more severe than the 
national average. That is, under the optimal adjustments as 
derived, there is a probability of overadjusting. 

To see this, we consider the optimal adjustments (6). 
Recall that the optimal weights f* are estimated on the 
basis of estimates of U and V. Suppose V is known but U is 
estimated by U, the estimated differential undercount 
(U = E. - C-). Thus, the optimal adjustment for a State may 
be expressed as 



+ v z 


Suppose U is positive, so that the State was undercounted 
more severely than the Nation as a whole. Then the optimal 

adjustment (7) overadjusts if U exceeds U. Under the 

LP + V 2 
distribution we have been considering for E , U is normally 
distributed with mean U and variance V 2 . The probability 
of overadjusting may thus be represented as 

Prob [Z (Z + U/V) 2 - U/V > 0] 


where Z = (U - U)/V has a standard normal distribution. 
These probabilities are shown below for various values of 
V/U. (The probability of overadjusting does not depend on 
the sign of U.) 

For all the values of V/U considered (note we have dropped 
the subscript /'), the optimal adjustments under the two loss 
functions are close. The optimal adjustments under absolute- 
error loss uniformly give more weight to the alternative esti- 
mate, but for values of V/U less than 1, the difference is slight. 
To sum up, the choice of weights for the loss function 
and the form of the loss function have little effect on the 

'The optimal f* for this loss function are the solutions to 

R (<P(Q*R) - 0.5) =4>(Q*R) 

where R = U/V, f* = Q*/(1 + <2*),and <t> and<£ are the normal cumu- 
lative distribution function and probability density function respec- 
tively (and we do not require 2 V; = 1 ) . 


Probability of overadjusting = 

.05 .10 .20 .40 .60 

1.0 2.0 3.0 5.0 

.48 .46 .43 .38 .35 

.32 .31 .31 .32 

For large values of V/U (1 or greater), the probability of 
overadjusting is fairly constant. For values of V/U less than 
1, the optimal rule is "conservative" in the sense that the 
probability of overadjusting decreases as the accuracy of the 

10 Thus (see footnote 8, above), in the Bayesian model corre- 
sponding to our example the distribution of C was taken to be normal 
with mean 6 and variance U 2 , but the compelling reasons for this 
particular distributional assumption are hard to find. 


undercount estimate decreases. For all the values of V/U 
considered, the probability exceeds 0.3. 

This has possible implications if one of the criteria of 
equity is that adjustment should not cause a party that 
would not be underal located funds under the unadjusted 
census data to be underallocated funds under the adjusted 
figures, or more briefly put, adjustment should not cause 
harm. This notion of equity might be interpreted to mean 
that for any State, the probability that adjustment causes 
harm should be less than some specified threshold value. 
If the threshold is less than 0.3, it might be desirable to 
shrink the optimal weights 1 - f * even further for States with 
negative U. Unfortunately, since the sum over States of 
f*U must equal zero, this would imply that the weights 
1 - f* for States with positive U must also be shrunk. This 
would increase the inequity in the allocations. 

Next, we consider the probability that adjustment will be 
in the wrong direction entirely. If U is negative, then adjust- 
ment in the wrong direction increases the underallocation. 
For the weighted-average rules we have been discussing, this 
probability is approximately the probability that U is greater 
(less) than 0, given that U is less (greater) than O.This prob- 
ability depends on the absolute value of the ratio V/U and 
is shown below. 


Probability of adjusting in 
wrong direction 

= .20 .40 .60 .80 1.0 

2.0 3.0 5.0 

= .00 .01 .05 .11 .16 

.31 .37 .42 

It is plausible that the absolute value of V/U is larger for 
areas with U close to zero, and thus adjustment by the 
weighted average rule will increase the harm to areas with 
small differential undercount (U slightly less than zero) 
with higher probability than for areas with large differential 
undercount (U much less than zero). However, since the 
probabilities of adjusting in the wrong direction do not 
depend on f* , they cannot be altered by shrinking f*. If one 
of the equity considerations is that adjustment will not in- 
crease the severity of underallocation to any party, then a 
possible approach would be as follows: For any State for 
which the probability of adjusting in the wrong direction ex- 
ceeds a threshold value, the adjustment will be scaled down 
to force the expected increase in underallocation below a 
specified level. 

This would have the same kind of effect noted above 
for the equity notion that adjustment should not cause 
harm. These considerations of equity of the adjustment 
process imply that less funds should be reallocated under 
adjustment than would be optimal under the considerations 
of equity in allocations alone. Further work is needed to 
explore the quantitative implications of these opposing 
equity considerations. Political discussion is also needed to 
determine whether there exist substantial concerns over 
equity in the adjustment process, as described. 


Thus far, we have been addressing the issue of equity in 
a narrow sense, disregarding statistical design questions such 
as how much data accuracy is needed, how should resources 
be allocated to improve the data, how much money should 
be spent on collecting and analyzing the data, and should 
adjustment be performed? 

Loss functions and decision theory are useful for design- 
ing the entire census effort, and considering the census as a 
whole lends perspective for considering undercount adjust- 
ment. Decision theory tells us to consider the costs and bene- 
fits of alternative data programs and choose the data program 
that maximizes the difference between expected benefit 
and cost [15] . 

Costs include: 

• data collection and analysis 

• political cost if the census is not adjusted for under- 

• a smaller political cost if the census is adjusted 

• additional administrative costs if two sets of books are 

• other costs 

Benefits include: 

• more or less quantifiable benefits from more accurate 
allocations of funds and congressional apportionment 

• other benefits for the public and private sectors, less 
studied and at this point less quantifiable 

More study is needed to assess the magnitudes of the benefits 
that can arise from better data and then to combine the 
different measures of benefit into one loss function. In par- 
ticular, in considering loss functions to represent inequity in 
fund allocations, we have been using the simple assumption 
that errors in allocations are roughly proportional to differ- 
ential errors in population figures. There are many allocation 
programs and they utilize population data in different ways. 
How should a single loss function (possibly a weighted sum 
of separate loss functions, as in [8] be devised to at least 
approximately reflect these different uses of the data? How 
sensitive will optimal adjustments be to these kinds of 
variations in the low function? 

In this larger benefit-cost framework, we may still use loss 
functions to measure equity, but now we must assign a dollar 
value to inequity from errors in allocation of funds. For ex- 
ample, if inequity is measured by a constant, times the sum 
of absolute errors in allocations to States, how big is the 
constant. To reduce errors in allocation by $100 million, is 
it worth spending $10 million, $1 million, or $100,000? In 
general, the optimal decisions for the benefit-cost problem 
will be far more sensitive to the form of the loss function 


than were the optimal adjustments for undercount con- 
sidered above. 

Because of difficulties in quantifying benefits and costs, 
decision theory is not a sufficient basis for planning a census, 
but the analysis does sharpen understanding of the merits of 
alternative programs, including programs that adjust for 
undercount versus programs that do not adjust, and post- 
enumeration survey operations and demographic analysis 
operations of different magnitudes. 


Two kinds of equity have been considered, equity in 
allocations and equity in the adjustment process. Concepts 
of equity in allocations can be represented by decision- 
theoretic loss functions. Care is needed to ensure that the 
loss functions represent our concepts of equity and also 
make sense statistically. Examples were discussed. Optimal 
adjustments that minimize the expected value of the loss 
function can be found. Under the simple models considered, 
the optimal adjustments were insensitive to the form of the 
loss function. If this finding extends to more complicated 
models, it should diminish debate about how to construct 
the loss function. In particular, to maximize equity in the 
allocations to ethnic and other subgroups of the population, 
we need not consider these subgroups explicitly in devising 
a loss function, but should focus instead on governmental 
units to which allocations are made directly. 

Equity in the adjustment process concerns the likelihood 
that the underallocations to some areas will be increased 
by adjustment (because of errors in the estimates of under- 
count) and that some areas, which would not be underallo- 
cated under the original census data, will be underallocated 
under the adjusted data. These equity considerations imply 
optimal adjustments different than the optimal adjustments 
for equity in allocations. Tradeoffs among the different 
criteria of equity need to be considered. More work is 
needed to examine the sensitivity of the optimal adjustments 
to different tradeoffs. 

Estimation of the accuracy of the undercount estimates 
may be more important in affecting the adjustments than 
the concepts of equity applied. Such estimation will be diffi- 
cult and needs attention. 

In conclusion, decision-theoretic methods can be used 
to adjust the census for undercount. More work is needed 
for formulating loss functions, but the optimal adjustments 
do not seem to be sensitive to the loss function adopted. 
More work is also needed to assess the accuracy of the under- 
count estimates, but this work is necessary for any statis- 
tically solid approach to be used. The main disadvantage of a 
decision-theoretic approach is that it may be difficult to ex- 
plain to the public the rationale for the adjustments. I do 
not find this disadvantage overwhelming, noting that season- 
ally adjusted price indexes and labor force estimates are 
widely used, and that the postcensal estimates of population 

and per capita income used for general revenue sharing 
purposes are incredibly complicated. 

There are numerous advantages to the decision-theoretic 

• Bayesian decision theory forms a coherent framework 
for combining different kinds of information, e.g., 
demographic analysis and dual systems estimates. 

• The flexibility of the approach allows us to consider 
different kinds of effects of errors in data; e.g., errors 
in allocations to subgroups who do not receive alloca- 
tions directly. It also allows for appropriate uses of 
estimates of undercount that are accurate for some 
areas and inaccurate for other areas. 

• Statistical work is somewhat protected from political 

• The decision-theoretic approach gives the most accur- 
ate final population figures (where the loss function 
serves as the measure of accuracy). 


1. Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. 
Discrete Multivariate Analysis. Cambridge, Mass.: MIT 
Press, 1975. 

2. Black, H. C. Black's Law Dictionary, 4th Edition. St. 
Paul, Minn: West Publishing Co., 1968. 

3. Ferreira, J. "Identifying Equitable Insurance Premiums 
for Risk Classes: An Alternative to the Classical Ap- 
proach." Division of Insurance, Commonwealth of 
Massachusetts, Automobile Insurance Risk Classifica- 
tion: Equity and Accuracy, 74-120. Boston: Massa- 
chusetts Division of Insurance, 1978. 

4. Hill, R. B., and Steffes, R. B. "Estimating the 1970 
Census Undercount for State and Local Areas." National 
Urban League Data Service, Washington, D.C. 1973. 

5. Jabine, T. B. "Equity in the Allocation of Funds Based 
on Simple Data." U.S. Bureau of the Census, Small - 
Area Statistics Papers, Series GE-41, No. 3. "Conference 
on Small-Area Statistics, Boston, 1976." Washington, 
D.C: U.S. Government Printing Office, 1977, 2-8. 

6. Keeney, R. L., and Raiffa, H. Decisions with Multiple 
Objectives: Preferences and Value Tradeoffs. New York: 
John Wiley and Sons, 1976. 

7. Keyfitz, N. "Information and Allocation: Two Uses of 
the 1980 Census." The American Statistician 33,2 
(1979), 45-50. 

8. Kish, L. "Optima and Proxima in Linear Sample De- 
sign." Journal of the Royal Stat. Society, A 139(1976), 

9. Marks, E. S., Seltzer, W., and Krotki, K. J. Population 
Growth Estimation. New York: The Population Council, 

10. Raiffa, H., and Schlaifer, R. Applied Statistical Decision 
Theory. Cambridge: Harvard University Press, 1961. 

11. Robinson, J. G., and Siegel, J. S. "Illustrative Assess- 


ment of Census Underenumeration and Income Under- 
reporting on Revenue Sharing Allocations at the Local 
Level. To appear in Proceedings of the ASA, Social Sta- 
tistics Section, 1979. 

12. Savage, I. R., and Windham, B. "Effects of Bias Removal 
in Official Use of United States Census Counts." The 
Florida State University, Department of Statistics, 
Tallahassee, Fla., 1973. 

13. U.S. Department of Commerce, Bureau of the Census. 
"Coverage of Population in the 1970 Census and Some 
Implications for Public Programs," by J.S. Siegel. Cur- 
rent Population Reports, Series P-23, No. 56. Washing- 
ton, D.C.: U.S. Government Printing Office, 1975. 

14. "Developmental Estimates of the Coverage of the 

Population in the 1970 Census: Demographic Analysis," 
by J.S. Siegel, et al. Current Population Reports, Series 
P-23, No. 65. Washington, D.C.: U.S. Government 
Printing Office, 1977. 

15. Spencer, B. "Benefit— Cost Analysis of Data Used To 
Allocate Funds: General Revenue Sharing." Unpub- 
lished doctoral dissertation, Yale University Statistics 
Dept., 1979. 

16. Stanford Research Institute. General Revenue Sharing 
Data Study, 4 vols. Menlo Park, Calif.: Stanford Re- 
search Institute, 1974. 

17. Strauss, R. P., and Harkins, P. B. 'The Impact of Popula- 
tion Undercounts on General Revenue Sharing Alloca- 
tions in New Jersey and Virginia." National Tax Journal 
XX VII (1974), 617-624. 


Harry V. Roberts 

University of Chicago 


The papers by Drs. Fellegi and Spencer provide thoughtful 
and stimulating examinations of ways in which considera- 
tions of equity bear on census reporting of undercount. 
Fellegi concentrates on the decision of whether or not to 
adjust the census "headcount" in the light of available 
information about undercount. (The quotes around "head- 
count" are to remind the reader that there are certain 
departures from a literal headcount in obtaining the number 
so described.) His approach is based on a particular loss 
function that provides mathematical expression of equity. 
Spencer discusses a variety of loss functions, stresses the 
practical implications of these loss functions in terms of 
various ideas about equity, and outlines the interplay 
between loss functions and probabilistic expressions of data 
inaccuracy when one minimizes expected loss, as required by 
Bayesian decision theory. Both provide discussions of data 
inaccuracy. Fellegi describes Canadian studies that illustrate 
the types of statistical information that can be used in 
estimating undercount. He also offers suggestions on the 
optimal design of such studies. Explicit Bayesian ideas are 
more conspicuous in Spencer's paper, but both authors are 
mindful of the need for certain judgments about data 
inaccuracy if a reasoned approach to adjustment is to be 


Both papers have contributed substantially to my under- 
standing of the issues of undercount. Both reach general 
conclusions that appear reasonable. Fellegi, citing a 
precedent but making no special argument, specifies a 
symmetrical squared-error estimation loss function as a 
quantitative expression of the concept of equity. He 
addresses inequity alleviated or caused by attempts to adjust 
headcounts in the light of evidence bearing on undercount. 
The latter is expressed in terms of a point estimator, possibly 
subject to uncertain sampling bias, and an accompanying 
standard error. He formulates the decision to adjust or not to 
adjust as a hypothesis-testing problem. Under certain 
reasonable assumptions about the sampling properties of the 
point estimator, and if sample sizes are not too small, his 
procedure is likely to point in the direction of adjustment. 

Spencer, like Fellegi, considers the equitable aspects of 
the decision of whether or not to adjust. His major attention, 
however, is focused on how adjustments should be made, 
that is, on how to estimate the undercount by reliance on a 
loss function that expresses equity. He concludes that the 
results are relatively insensitive to the precise specification of 
loss functions. 

In my discussion, I shall argue that from the professional 
perspective of statistics, I see virtually no controversial 
questions of equity but enormous technical problems of 
estimation. In developing my reasoning, I find the distinction 
between statistical and political equity suggested by Robert 
Hill to be most useful. In the undercount context, political 
equity is expressed in allocation formulas of Congress, while 
statistical equity concerns the appropriate inputs to these 
formulas when the actual inputs presumably intended by 
Congress (for example, true population) are not available. 
Hence we have the focus of this conference on undercount, 
the discrepancy between what Congress presumably had in 
mind and the readily available headcount. 

Statistical equity is no different in estimation of under- 
count than in estimation of any other quantity of interest to 
public decision makers, such as the percentage unemploy- 
ment. The question is, "Given certain evidence, how do we 
make the point estimate?" Whether or not statisticians have 
viewed these problems in the decision-theoretic framework, 
they have implicitly proceeded as if a symmetric loss 
function were appropriate in estimation. For optimal design 
of such studies, one would in principle want to go further 
than specification of symmetry and one would have to assess 
numerical parameters of the loss functions. For estimation, 
given available evidence, symmetry is enough. Thus, although 
I argue for an absolute-error loss function instead of the 
squared-error loss function preferred by Fellegi, this is a 

Estimation of undercount can be compared with esti- 
mation of various components of nonsampling error in 
Government surveys. Just as statisticians have not estimated 
undercount except in postcensus research, they do not 
usually estimate nonsampling errors in other surveys, 
although they may do research on these errors and present 
general conclusions about their nature and possible extent. 
The reason for such practices is to be found not in questions 
of equity but in the nature of the evidence and the available 
statistical tools. There is a strong inclination by statisticians 
to estimate only quantities for which there is general 
professional consensus as to how the estimates should be 
made. If different statisticians confronted with the same 
evidence would come up with widely varying estimates, there 

/ am indebted to Bruce Spencer and Arnold Zellner 
for helpful comments on drafts of this discussion. 



would be reluctance to quote any estimate at all. This 
attitude indeed distinguishes statisticians from other 
professionals, such as intelligence analysts or economic 
forecasters, who are expected to make estimates even when it 
is understood that no professional consensus may exist. 

The conference has heard considerable discussion about 
the desirability of simplicity of estimation procedures. To 
the extent that simplicity reflects the desire for parsimonious 
statistical models whenever possible, simplicity is a 
desideratum. But the main desideratum is professional 
consensus. The estimates of unemployment and the seasonal 
adjustments thereof are far from simple, but statisticians 
agree that they— or closely similar estimates— are technically 


Before beginning my central argument, I pause to mention 
a practical suggestion stimulated by the papers of Fellegi and 
Spencer. The idea is implicit also in other writings on 
undercount, and Dan Melnick has reminded me that it was 
actually adopted in the wording of the 1977 proposed census 
bill, H.R. 8871. 

I applaud the focus of both authors on undercount and 
statistical estimates of undercount, as opposed to adjusted 
estimates of population, because this focus suggests a simple 
resolution to one aspect of the controversies about census 
adjustment. It has been said that if census counts are 
adjusted, there would be a need to publish two sets of census 
numbers, the unadjusted headcounts and the adjusted 
population estimates, and this practice would be confusing. 
Confusion can be almost entirely avoided by the simple 
expedient of publishing headcounts and estimates of under- 
count; the latter down to whatever levels of disaggregation 
may be determined. Then a user desiring an adjusted estimate 
would have to perform the arithmetic of combining the 
headcounts and the undercount estimates, and hence would 
be forced to see the precise nature and extent of the 
adjustment, something easily forgotten when adjusted 
numbers are conveniently available. This publication policy 
would also palliate the problem posed by later revisions, 
since the headcounts themselves would remain unchanged 
even if the undercount estimates were later to be revised, say 
in the light of demographic analyses that would not become 
available as early as other indicators of undercount. Con- 
fusion and careless interpretation would be reduced. 

I regard the term "adjustment," which is by now 
hopelessly entrenched, as unfortunate from a semantic point 
of view. "Estimate the undercount" correctly suggests the 
nature of the problem. "Adjust the headcount" conjures up 
the image of cooking the data. 

Undercount can be defined and estimated either in 
absolute or relative terms. If relative undercount is to be 
reported, reporting considerations suggest a change of the 
customary definition of relative undercount, in which the 

denominator is the unknown true population value. My 
proposal is that relative undercount be defined with the 
known headcount as denominator. Thus an undercount of, 
say, 2 percent, would imply a needed upward revision of the 
headcount by 2 percent, rather than a division of the 
headcount by 1-0.02 = 0.98. In most instances, the 
difference would of course be small, but if users are to do 
arithmetic, the arithmetic should be as simple as possible. An 
alternative would be to report absolute undercount, but the 
use of percentages of estimated undercount, defined as 
suggested, gives a better picture of the adjustment. 


Bayesian language is convenient for communication 
because it has names for all relevant concepts. For this 
reason, I shall freely use Bayesian as well as non-Bayesian 
terminology in developing my argument. In so doing, I offer 
the assurance to non-Bayesians that I shall not advocate 
intrusion of private, "nondiffuse" judgments into the 
statistical analysis (which would be objectionable from our 
present perspective) and that an approximate non-Bayesian 
translation of my reasoning is possible. 

Estimation of undercount (adjustment of headcount) is a 
problem in Bayesian point estimation. The optimal point 
estimate minimizes expected estimation loss with respect to 
the posterior distribution of the quantity to be estimated. 
The practical problem is whether or not the posterior 
distribution of undercount can be regarded as "public"; 
that is, would statisticians be in essential agreement as 
to what it is? (For statisticians who wish to avoid Bayesian 
terminology, I would ask if there is near numerical agreement 
between their confidence intervals and the corresponding 
Bayesian credible intervals.) 

In many applications, the posterior distribution is public. 
Consider, for example, my example of estimation of un- 
employment. The estimates are obtained from the Current 
Population Survey (CPS), and are widely accepted. (Con- 
troversy about these unemployment estimates has turned on 
problems of definition, such as the definition of a 
discouraged job seeker, rather than on the central process of 
sampling and estimation. There are also second-order 
controversies that do not affect the main point.) The key to 
consensus about unemployment estimation is the probability 
sampling design of the CPS and the consequent availability of 
what I call an essentially unbiased estimator of unemploy- 
ment in the sampled population. 

(By "essentially unbiased," I mean an estimator derived 
from a probability sample that may technically be subject to 
mathematical bias— consider the typical ratio estimate— but 
that is not subject to uncertainty about selection bias due to 
such sources as nonresponse or nonprobability selection. 
From the perspective of "model-based analysis," which is 
now distinguished from "probability sampling," one might 
find similar consensus if the data permitted adequate 


diagnostic checking of the specifications or assumptions of 
the model.) 

Further, let us assume that an estimated standard error is 
obtainable, and the sampling distribution of the estimator 
can be approximated by a normal distribution. Then comes 
the essential Bayesian assumption: We assume that 
statisticians would agree that the prior distribution of 
unemployment is "diffuse." That is, all agree that, by 
comparison with the evidence conveyed by the sample 
estimate, the prior knowledge is to be accorded so little 
weight that, for practical purposes, differences in that 
knowledge can be ignored. 

The assumption about the point estimator discussed above 
implies an approximate likelihood function. (Technically, a 
diffuse prior distribution is nearly uniform in the range of 
parameter values for which the approximate likelihood func- 
tion is appreciably nonzero.) Then the posterior distribution 
of unemployment is approximately normal with mean at 
the point estimate and standard deviation equal to its stand- 
ard error. It is this posterior distribution that can be regarded 
as "public" in the sense that its inputs-prior distribution and 
likelihood function-could be agreed to (subject at most to 
minor quibbles that would have little impact on the numer- 
ical result) by all statisticians. 

We have not yet come to the Bayesian point estimate, 
which is derived from the posterior distribution and a loss 
function by application of the principle of minimization of 
expected loss. For any symmetric, and otherwise well 
behaved, loss function, the appropriate Bayesian point 
estimate is the mean of the posterior distribution. Tech- 
nically, the estimator that is unbiased in the sampling sense 
leads to a Bayesian estimate that is unbiased a posteriori. 
That is, the conditional expectation of the estimate, given 
the data, is the target of estimation. 


Turn now to the estimation of undercount. Suppose 
that an infallible postenumeration survey is available. 
By infallible, I mean that in the sampled areas, all persons 
missed by the census are located. Hence an unbiased 
estimator and an accompanying standard error are available. 
By comparison with this information, we can regard the prior 
distribution as diffuse. Then the posterior distribution of 
undercount (overall or in any region of sufficient size) is 
approximately normal, with its mean at the point estimate 
and standard deviation equal to its standard error. The 
posterior distribution would be close to fully public, just as 
in the example of the previous section. Again, the non- 
Bayesian point estimate is the same as the Bayesian point 
estimate given a symmetric loss function. 

One central question of Fellegi's paper, and of the 
conference, is whether a statistical estimate of undercount 
should be made at all. The alternative to a statistical estimate 

is, in effect, to estimate undercount at zero and thus stick 
through thick and thin with the headcount. As I have 
formulated the scenario, the case for making the estimate 
(that is, adjusting the headcount) would be very strong. The 
only serious counter-arguments would be those of costs 
(extra data processing, computation, printing, etc.) or bad 
side effects (public confusion, temptation to tamper with 
"hard" numbers, etc.). In the absence of such considerations, 
the principles of decision theory apply: It is reasonable to 
think of the legislature or administrative agency as a single 
idealized decisionmaker, and the assumptions about loss 
function, prior distribution, likelihood function, and 
posterior distribution are about as compelling as I can 
imagine them to be. The statisticians can provide the 
posterior distribution. A symmetrical loss function expresses 
the statistical equity considerations in a manner that should 
accord with legislative intent, which is primarily concerned 
with political equity. The estimate of undercount follows. 

Moreover, the same reasoning applies widely to statistical 
work directed towards public policy, quite apart from the 
special considerations of equity that are so prominent here. 
If truth is a known number plus an uncertain number, and if 
the evidence on the uncertain number is of the kind I have 
sketched, there is hardly discretion, at least from statistical 
principles, about the need for a point estimate that reflects 
the available evidence about the uncertain number. 

Suppose, however, that the standard error of the esti- 
mated relative undercount were very large, say 0.10, as might 
be true if the estimate were based on a very small random 
sample. Few statisticians would be foolhardy enough to stick 
to the adjustment under this change of scenario. It is 
essential, however, to recognize the rational source of the 
reluctance to make an adjustment. It is that the assumption 
of a diffuse prior distribution would be hopelessly un- 
realistic under the circumstances. That is, the likelihood 
function would not be "sharp" with respect to any reason- 
able prior distribution, at least based on American 
experience. It is not a reluctance to adjust per se, but a 
sensible reluctance to adjust when other evidence suggests 
that the adjustment indicated by the sample estimate in 
question is inconsistent with other relevant information. 

The key to the problem of adjustment is whether or not 
the statistical evidence bearing on the adjustment is of the 
kind that can dominate other kinds of information bearing 
on undercount about which statisticians may reasonably 
disagree. (In my opinion, this type of reasoning suggests why 
statisticians are reluctant to adjust for nonsampling errors in 
ordinary survey practice.) 

If we consider an application in which one cannot take 
the prior distribution as diffuse with respect to the likeli- 
hood, we cannot hope to have the kind of expert consensus 
sketched above, which provides a support that statistical 
professionals find comfortable. If "comfortable" sounds 
self-serving, note that this kind of "comfort" has heen 
provided by the Census Bureau's long emphasis on 


probability sampling for public studies, which contrast with 
the nonprobability sampling designs employed extensively in 
private commercial surveys. 

This simplified scenario suggests that statisticians might be 
able to agree on the need for and the nature of adjustment if 
the evidence were based on some clean probability sampling 
evidence relating to undercount, and if they believed that 
any information besides that of the sample could be ignored. 

The scenario is oversimplified. It would be completely 
realistic only for an infallible postenumeration survey, one 
which, in sampled areas, located and counted all persons 
missed by the census. Even in this example, the scenario 
would apply only for those regions in which the sample size 
is large enough to justify the assumption that statisticians can 
agree that nothing serious would be lost by ignoring 
nonsample information about undercount. The effective 
sample size of the postenumeration survey can be enhanced 
by statistical ingenuity, as is illustrated in several of the 
papers of this conference. The integration of sample evidence 
coming from different sampling schemes, such as post- 
enumeration survey and reverse record checks, is probably 
within the range of statistical technology, and this integra- 
tion will be a major task regardless of the policy taken 
towards official estimation of undercount. 

The more difficult problem is an allowance for the 
inevitable bias stemming from the fact that the post- 
enumeration survey or other sampling schemes are far from 
infallible, being to some degree subject to the same sources 
of bias that lead to undercount in the census itself and being 
correctable only imperfectly by demographic analyses that 
take much longer to carry out than do the samples themselves. 
Uncertainty about this residual bias will be substantial; there 
will be strong opinions about its probable extent, and these 
opinions will probably vary from statistician to statistician. 
As a result, there will be no grounds to support a diffuse prior 
distribution, nor will there be substantial consensus about 
the appropriate nondiffuse prior distribution. 

However, estimation of components of undercount could 
be useful even if the postenumeration survey were partly 
fallible. Suppose that undercount consisted of two mutually 
exclusive components, people hard to find in the census who 
can be found in the postenumeration survey, and people 
impossible to find in either because they don't want to be 
counted. A postenumeration survey could provide an 
estimate for the first component even though it would fail 
completely with respect to the second. It would be useful to 
proceed with estimation of the first component in line with 
the principles just cited, even though the second component 
is estimated at zero despite knowledge that the true number 
has to be positive. In other words, estimates of one compo- 
nent of undercount can be worth making even if other com- 
ponents elude completely the statistical net. (The same prin- 
ciple would suggest that if a satisfactory estimate were avail- 
ablefor blacks but notfor Hispanics,the black estimate should 
be used for blacks and zero should be used for Hispanics.) 


Fellegi's symmetrical squared-error loss function is con- 
venient, often at least a good approximation to what is 
needed in applications, and widely discussed and used in 
statistics. I am not sure that it is a wise choice here. If one 
estimates the undercount in a given region, the estimation 
error will have, under many allocation formulas, the approxi- 
mate effect of a proportional and unintended transfer 
payment from or to the /th region. Quite independently of 
allocation formulas, it will mean unintended gains to some 
people and losses to others. 

A positive unintended transfer payment in one region can 
be associated with equally unintended negative transfers, in 
aggregate, in all other regions, either in the sense that less 
funding for the program is available from a fixed total or 
that taxpayers in aggregate have to make a transfer that 
Congress did not intend. Similar reasoning applies to negative 
unintended transfers. 

Spencer raises the possibility that equity of subgroups, 
such as Hispanics or blacks, must be considered separately, 
although in his formulation, conclusions may not be sub- 
stantially changed by such consideration. My tentative 
position is that once one formulates a loss function that deals 
with unintended transfer payments by areas, no further 
consideration is needed of subgroups within the areas. 

The concept of transfer payment is well established and 
understood both in economic jargon and practical politics, 
and it seems natural to express estimation losses as the 
absolute value of any transfer payment not intended by 
legislation or administrative regulation but occasioned by 
errors in the census numbers used in allocation formulas. I 
agree with Fellegi that there is a symmetry in the practical 
consequences of errors in either direction, but my feeling is 
that loss is essentially linear in dollars within the ranges of 
unintended transfers occasioned by census inaccuracies. 

A change from squared-error loss to absolute-error loss 
would require some modification of Fellegi's calculations 
and make them less tractable. However, I believe that the 
essential conclusions would be intact. 

(In tidying up technical details, the problem posed by 
unbounded loss functions, which can lead to paradoxes, 
would have to be considered. My attitude is that all loss 
functions represent approximate assessments that must not 
be pressed literally when the model suggests that paradoxes 
arise by so doing.) 


Fellegi's paper and part of Spencer's presuppose that 
headcounts are a logical starting point for discussion of 
equity. I think that this presupposition is ill-advised. One can 
attach loss functions to inequities that may be created by 
past practice, but the attempt to do so seems academic when 
there is such general agreement about the potential inequities 


of differential undercount. Headcounts are not a near- 
sacrosanct starting point to be shifted only with great 
trepidation and with close attention to the distributive 
effects of the change on areas that would be worse off than 
under the present starting point. 

Suppose, for example, that undercount is more substantial 
in some race-age-sex groups than in others. The inequity, if 
any, would arise if allocations were made without taking 
advantage of all information that can be properly digested 
statistically, or if point estimates were based on inequitable 
estimation loss functions, not if there were simply a 
departure from a historical practice of reporting only 
headcounts. Groups that "lose" from better statistical 
estimation are treated inequitably only in the sense that they 
would be deprived of an historical advantage that reflected 
limitations of statistical methodology used in the past. In 
principle, the costs imposed on those who have acquired 
what might be considered a vested interest in the headcount 
could be incorporated in the loss function, but this would 
raise difficult and subtle questions that are dealt with in 
Spencer's paper. In my view, these questions are well 
bypassed; the argument above suggests that they can be 
bypassed without doing violence to the idea of equity, and 
their answers would not easily be found, as Spencer suggests. 
Moreover, the suggestion made by Keyfitz that any con- 
vention about adjustment be agreed upon in advance would 
tend to shift the focus toward the central equity issue rather 
than the issue of how things would be changed by a change 
in estimation of population from the census. 


The realization that there are important remaining biases 
in postenumeration or other matching surveys directed at 
estimation of undercount has had a negative effect on 
thinking about adjustments for undercount. In particular, it 
has discouraged exploration of the full potential of attainable 
adjustments using statistical techniques accepted by all 
statisticians. For the purpose of dealing with undercount in 
1980, we should simply accept that some allowances for 
undercount are possible within the framework of 
methodology accepted by statisticians. By this I mean what I 
developed in the preceding sections: Estimates of an 
important component of undercount can be made by 
application of probability sampling and accepted principles 
of statistical analysis. We should not hesitate to make these 
estimates for fear that allocation problems somehow entail 
equity considerations that pose challenges to standard 
statistical practice. Some of the papers at this conference 
have tackled what would be entailed. 

The remaining sources of undercount remain beyond 
present capability, but not necessarily future capabilities. We 
should encourage work that deals with them, whether at the 
level of improved field search procedures in postenumeration 
surveys, fuller development of other sources of information, 

more efficient methods of matching, or extensions of 
Bayesian methodology to what the paper by Dempster and 
Tomberlin calls "second-order undercount assessment." 


Given an approximately normal posterior distribution for 
undercount, any symmetric loss function serves to define the 
mean as a point estimate. There is a different question for 
which the details of the symmetric loss function are 
important. If the cost of obtaining the needed posterior 
distribution is taken to be nontrivial (either in the sense of 
computation or collection of new data), we have a design 
problem. For example, should we estimate undercount by 
States, metropolitan areas, counties, cities, towns, minor civil 
divisions, tracts, blocks, or what? Fellegi touches on design 
questions in his paper, and Spencer does so indirectly by 
subtraction of the costs of obtaining information from the 
total amount to be allocated. If the question is confronted 
formally, and if absolute error cost functions are accepted to 
be appropriate, then the constant of proportionality would 
have to be assessed judgmentally in order to make any formal 
analysis of design questions. The question would be some- 
thing like, "How much would Congress be willing to spend to 
eliminate an unintended transfer of one dollar?" Design 
questions like this are intrinsically harder than questions of 
analysis, and the solution is likely to be made more 
arbitrarily. The Congress may simply authorize a certain total 
expenditure for a postenumeration survey, and the Census 
Bureau may then have to stop disaggregating when the 
budget runs out. 


Unfortunately, it appears from presentations and discus- 
sions at this conference that all statistical sampling methods— 
postenumeration surveys and record matches— that can be 
applied down to small areas and yield relatively prompt 
results are more fallible than I had imagined them to be. For 
example, contrary to statistical folklore, the post- 
enumeration survey may be less thorough than the census 
itself. The Canadian matching studies reported by Fellegi are 
much to be envied. This is the basis for my introductory 
assertion that there appear to be no controversial questions 
about statistical equity but enormous technical difficulties, 
much greater difficulties than I had believed before coming 
to the conference. As a result, it will be impossible to have 
the same confidence in an estimate of undercount as in, say, 
the estimation of unemployment from the CPS. It may even 
be impossible to have professional consensus upon which 
estimates of undercount (adjustments of headcount) can be 

Perhaps, however, consensus on posterior distributions, 
while sufficient, is not a necessary condition for professional 


consensus. Hence, I have begun to wonder if some alternative 
basis for consensus can suffice for the immediate problem. 
The discussion of synthetic estimation and related ideas, such 
as those of Purcell and Kish, may point the way in a useful 

In the synthetic method, we start with national estimates 
of undercount, based on demographic methods, by groups 
defined jointly by race, age, and sex. (These demographic 
estimates become available with greater delay after the 
completion of the census than do the results of the various 
sampling methods, such as the postenumeration survey and 
record matches.) Suppose the national demographic esti- 
mates are completely accurate. For any smaller area than the 
Nation as a whole— region, State, city, block, etc.— we apply 
the national estimates of relative underenumeration by race, 
age and sex to the headcount within each race-age-sex group 
within the area. 

If the resulting estimates are aggregated across all 
race-age-sex groups within the area, we obtain the synthetic 
estimate of undercount for the area. The method can be 
applied straightforwardly down to the smallest areas, as 
Robert Hill's paper explains. 

What kind of professional consensus might be obtainable 
in support of undercount estimates based on the synthetic 
approach? The decision-theory skeleton cannot be fleshed 
out by sample evidence bearing directly on the areas for 
which undercount is to be estimated, nor can the regression- 
type methods discussed by other speakers be applied to 
achieve the same goal. The fundamental support for analyses 
like those presented by Fellegi and Spencer would be absent. 
If the Census Bureau were to estimate undercount by the 
synthetic approach, the resulting efforts would not have the 
firm technical support at which the Bureau has aimed since 
the development of probability sampling methods in the 

However, it may be helpful to explore further. With 
respect to any one area, a synthetic estimate could perhaps 
serve as the mean of a consensus prior to distribution for 
undercount. If further information about the area were 
unavailable, or if such information were of a kind as to leave 
a statistician undecided even as to the best direction of 

modification of the synthetic estimate, the synthetic esti- 
mate could perhaps serve as a reasonable point estimate of 
undercount. If the area were relatively large and 
heterogeneous, this conclusion seems more plausible than if 
the area were small and homogeneous. As one speaker put it, 
the undercount rate for blacks in a wealthy suburban area is 
likely to be lower than that in a ghetto area of the central 

But the evidence presented in the scatter plot in Spencer's 
paper is disquieting. This shows a surprisingly low correlation 
across States between synthetic estimates and direct 
demographic estimates. The two methods suggest a different 
geographic picture of the incidence of undercount, and the 
dispersion among States is substantially lower for the 
synthetic method. Of course. Jay Siegel has pointed out that 
more serious problems with demographic methods are entailed 
at the State level than at the national level. But this 
evidence does not rule out the possibility that the uni- 
formity postulate of the synthetic method is seriously wide 
of the mark. 


Although allowance for equity per se does not appear to 
present a hard problem, I find no easy suggestions as to how 
the Census Bureau should estimate undercount by sub- 
national areas. The traditional statistical path, which the 
Bureau has followed in its sampling studies, is limited by 
serious practical problems that I had not fully recognized. 
The synthetic method, or something like it, offers a possible 
way around these practical problems, but it appears to have 
no support in cross-validation. Whatever is done in 1980, the 
Bureau will fall short of providing information about 
undercount of a technical quality comparable to that of its 
other programs. (Of course, undercount looms as an un- 
noticed shadow in the background of all family and 
individual surveys conducted by the Bureau, including, for 
example, the CPS.) The one clear advice is to continue 
vigorous research into ways of making better estimates in 
the future. Here my feelings are suggested by the clause, "If 
we can put a man on the moon, ...." 

Floor Discussion 

Following Harry Robert's discussion of the papers by 
Fellegi and Spencer, a member of the audience asked Mr. 
Fellegi how his conclusion about adjustment would change if 
the measure of inequity were large in one State and small in 
another. As the large provinces in Canada account for 80 
percent of the population, how does the geographic situation 
come into consideration in making adjustments for inequity? 
It was conceded that the large provinces do dominate the 
measure, but that was not thought to be relevant to the 
decision-theoretic framework. It is relevant, however, in 
terms of the sample sizes needed and the allocations. The 
optimum allocation turns out to be fairly close to the square 
root of the population, which seems to indicate that the 
effect of the population is less in the overall picture than 
would be indicated otherwise. That is a peculiarity of the 
Canadian demography. In terms of the inequity framework, 
the logic would not be different if the population distri- 
bution were different. The kind of measure would take the 
form of a convention: One either adjusts or not, but one 
does not adjust only for some States, the winners, and not 
for those that would lose. Only an overall measure is 
proposed, with the decision being made ahead of the 
availability of counts. 

An alternative would be to decide on the basis of 
individual States whether to adjust or not. If one decided to 
adjust because one State has such a large amount of inequity, 
one would adjust for all States. 

It was suggested that perhaps the decision not to adjust 
could be made if inequity is small. If there are a lot of 
"medium" underenumerations, but none tremendously so, 
then the adjustment should be made if inequity is reduced 
by the previously agreed-upon adjustment. 

It was noted also that some of the implications from the 
two papers today contradict Ms. Slater's conclusions. For 
example, is it Messrs. Fellegi's and Spencer's contention that 
the larger the underenumeration, the larger the benefits that 

might come from an adjustment? In the second paper, the 
point was not to look at the black population per se but at 
localities with large concentrations of blacks and other 
minorities. By definition, an adjustment would move closer 
toward improving and making more equitable the flow of 

Illustrations as to how to construct a measure of equity 
designed to model legislative intent for a fixed-total amount 
of allocation as well as a per capita allocation formula were 
given. Neither of these types of legislation relates to a 
particular subgroup of the population, but rather to the 
amount of Federal funds flowing to the particular govern- 
ments. A model of the inequity in undercount was con- 
structed to see whether the inequity could be reduced, and 
this can then be tested. 

If adjustment is done in the decision-theoretic framework, 
it was felt that increased equity would be achieved overall, 
but it is also possible that some groups who would think they 
would gain might not, depending on the method of adjust- 
ment used. The decision-theoretic approach should be 
followed because equity is not fixed, as suggested in the 
Slater paper. Rather, it is flexible over time and it is 
unknown what the uses of the data will be after the data 
have been produced. 

The Bureau commented that loss functions in general are 
being considered. There is no reason why one cannot go 
beyond the 50 States and treat many interested groups, and 
thus mitigate the effects of the undercount for those groups. 
It would be proper to discuss the problem that way, if the 
legislation is framed so that the amount of money (per capita 
or fixed pie) that ought to go to those particular subgroups is 
specifically indicated. This is the crossover point between 
political and statistical equity. It is not the statistician's 
function to second-guess congressional intent; rather, it is 
their function to see if the intent is better met by adjustment 
or not. 


Appendix A 

Letter of Invitation for Conference Papers 




Bureau of the Census 

Washington, D.C. 20233 

As the 1980 census approaches, there is increasing interest in the population 

counts and in measuring and possibly adjusting for any undercounts. Despite 

the Bureau's best efforts to obtain a complete and accurate count, some 
undercount is likely to occur. 

Although the Bureau has been active in research concerning the undercount, 
we also are anxious to encourage as comprehensive a review of the undercount 
issue by the general research community as possible. A 2-day conference 
has been scheduled in Washington for February 25-26, 1980, to examine a 
broad range of technical concerns regarding the issue of adjusting for the 
undercount. An agenda that will accommodate approximately 10-12 invited 
and contributed papers is being planned, with discussants for each. Attend- 
ance at the conference will be limited to listed participants and a small 
additional group of invited persons to assure efficient use of the 2 days. 
The papers will be refereed by a steering committee that has been assembled 
to plan and conduct the conference. Authors whose papers are used in the 
conference will receive a $1,000 honorarium, plus travel and expenses for 
participation. The proceedings of the conference will be published. 

We would like to invite you to prepare a paper on any of the issues listed 
below or other closely related topics dealing with the undercount issue 
according to your area of special interest. 

The issues to be examined include: 

1. Methods of measuring the undercount for subnational areas, 
including the quality of the estimates of undercount in 
relation to the size and other characteristics of the area; 
and the feasibility of providing accuracy checks or con- 
fidence intervals. 

2. The timing of the ad justment(s) for undercount. 

3. Measuring and adjusting for the undercount and misreporting 
for factors other than total population, such as social and 
economic characteristics. 



A. The use of adjusted figures in Federal programs and the impact 
of adjustments on the Federal statistical system. 

5. Political and legal issues in making adjustments to the 
census counts. 

6. The effects of adjustments on equity in the distribution of 
Federal funds. 

7. Decision theory and theoretical aspects of adjustment. 

Although the list is not exhaustive and many of the issues noted may not be 
mutually exclusive, they represent some of the principal questions to be 
discussed at the conference. The acceptance of the proposed papers will 
be left to the steering committee with the goal of a proper balance in 
the treatment of the various questions at the conference. In order that 
the steering committee might establish a list of authors by the end of 
September, your proposal for a paper should be received by September 15. 
Dr. Conrad Taeuber (Georgetown University, Center for Population Research, 
Washington, D.C. 20C57) is a consultant to the steering committee and the 
Census Bureau in the organization and management of the conference, and 
I suggest you contact him directly in responding to this invitation. 

The Bureau considers the conference an important step in guiding us in 
dealing with the undercount issue, and we look forward to a favorable 
response from you. 





Bureau of the Census 

Requestfor Papers 

Distributed to selected college and university departments and at the 1979 annual 
conference of the American Statistical Association, Washington, D.C. 

Arrangements are being made for a 2-day conference February 25-26, 1980, in 
Washington, D.C. to assess the feasibility of measuring and adjusting for census 
undercounts at different levels of geography and for selected characteristics, and to 
discuss possible techniques and approaches for both measuring and adjusting for the 
undercounts. Although the Census Bureau has itself been active in research concerning 
the undercount, the conference is being organized to obtain the views of others on a 
broad range of technical concerns regarding undercount adjustments. The purpose of this 
notice is to solicit papers for the conference. 

A conference agenda is being planned that will accommodate approximately 10-12 
papers, with discussants for each, examining such issues as (1) the methods available for 
measuring and adjusting for the undercount for subnational areas, (2) the timing of the 
adjustments, (3) the feasibility of extending the adjustments to population characteristics 
beyond total population, (4) the use of adjusted figures in Federal programs, (5) the 
political and legal issues in making adjustments to the census counts, (6) the effects of 
adjustments on equity in the distribution of Federal funds, and (7) decision theory and 
the theoretical aspects of adjustment. Papers are solicited on these or closely related 
topics dealing with the undercount issue according to your area of special interest and 

The papers will be refereed by a steering committee that has been assembled to plan 
and conduct the conference. Authors whose papers are used in the conference will receive 
a $1,000 honorarium, plus travel and subsistence expenses for participation in the 
conference. A final decision on the acceptance of the papers will be left to the steering 
committee with the goal of a proper balance in the treatment of the various questions at 
the conference. In order that the steering committee might establish a tentative list of 
authors by the end of September, your proposal for a paper should be received by 
September 15. Proposals for papers should be sent directly to Dr. Conrad Taeuber 
(Georgetown University, Center for Population Research, Washington, D.C. 20057), who 
is serving as a consultant to the steering committee and the Census Bureau in the 
organization and management of the conference. 


Notice Inviting Papers 

Appeared in Amstat News, Number 58, American Statistical Association 
(September-October, 1979), p. 4. 

The Bureau of the Census invites papers outlining innovative methods for measuring 
the completeness of the 1980 Census. Papers will be reviewed by a panel of consultants 
who will determine which ones are to be presented to a conference on the completeness 
of the census early in 1980. Authors of accepted papers will receive $1,000 and will be 
invited to present their papers at the conference. The deadline for receipt of papers is 
November 30, 1979. Manuscripts and inquiries should be submitted to Conrad Taeuber, 
Georgetown University, Washington, D.C. 20057. 


Appendix B 

Attendees At Conference On 

Census Undercount February 25 - 26, 1980 



Census Advisory Committee on the Asian 

and Pacific Americans Population 

1525 S. Brockton Avenue, #9 

Los Angeles, California 90025 

BAILAR, Barbara A. 

U.S. Bureau of the Census 

BAILEY, Ronald 

Executive Director 
Illinois Council for Black Studies 
African-American Studies Program 
Northwestern University 
Evanston, Illinois 60201 

BARABBA, Vincent P. 

U.S. Bureau of the Census 


U.S. Bureau of the Census 

BEALE, Calvin L. 

Leader, Population Studies 
Economic Development Division 
ESCS, Department of Agriculture 
Room 492, 500 12th Street, S.W. 
Washington, D.C. 20250 

BECKER, Patricia C. 

Data Coordination Division 
Planning Department 
City of Detroit 
34 Cadillac Tower 
Detroit, Michigan 48223 

BOHME, Frederick 

U.S. Bureau of the Census 


U.S. Bureau of the Census 

BRACE, Kimball 


Election Data Services 
1500 Massachusetts Avenue 
Washington, D.C. 20005 

BRACEY, Karen 

Operations Research Analyst 
U.S. General Accounting Office 
441 G Street, N.W. 
Washington, D.C. 20548 

BRYCE, Herrington J. 

The Academy for Contemporary Problems 
400 N. Capital Street, N.W. 
Washington, D.C. 20001 

BUSH, Carolee 

U.S. Bureau of the Census 

BUTLER, Matthew 

Office of Revenue Sharing 
U.S. Department of the Treasury 
2401 E Street, N.W. 
Washington, D.C. 20226 


New York State Legislative 

Advisory Task Force on Reapportionment 

250 Braodway - 20th Floor 

New York, New York 10007 

CARTER, Barbara L. 

Associate Vice President 
University of the District of Columbia 
1638A Beekman Place, N.W. 
Washington, D.C. 20009 

CAYWOOD, Craig P. 

Staff Associate 
National League of Cities 
1620 I Street, N.W. 
Washington, D.C. 20006 

CHILTON, Stephen 

Analyst in Social Legislation 
Congressional Research Service 
Library of Congress 
Washington, D.C. 20540 

COPE LAND, Kennon 

U.S. Bureau of the Census 




U.S. Bureau of the Census 

COWAN, Charles 

U.S. Bureau of the Census 

FARRAR, Eleanor 

Vice President 

Joint Center for Political Studies 
1426 H Street, N.W., Suite 539 
Washington, D.C. 20005 



Brown University 

Box F 

Providence, Rhode Island 20912 

DARCY, Jacquie 

Market Growth, Inc. 
733 15th Street, N.W. 
Washington, D.C. 20005 

DAVIS, Martha 

Special Assistant to the Chief Economist 
U.S. Department of Commerce 
14th and Constitution Avenue, Rm. 4850 
Washington, D.C. 20230 

DAVIS, Vicki 

U.S. Bureau of the Census 

DOUGLAS, Arietta 

Office of the Mayor 

City of Detroit 

Detroit, Michigan 48226 

DOYLE, Brian 

Director, Evaluation and User Services Section 
Population Census Branch 
Australian Bureau of Statistics 
Box 10 Belconnen 
A.C.T. 2612, Australia 

ECKLER, A. Ross 

3635 Edelmar Terrace 

Silver Spring, Maryland 20906 

ENGELS, Richard A. 

U.S. Bureau of the Census 

ERICKSEN, Eugene P. 

Sampling Statistician 
Institute for Survey Research 
Temple University 
Philadelphia, Pennsylvania 19122 


U.S. Bureau of the Census 

FAY, Robert E., Ill 

U.S. Bureau of the Census 


Assistant Chief Statistician 
Statistics Canada 
Tunney's Pasture 
Ottawa, Canada KIA OT6 


U.S. Bureau of the Census 

FERBER, Robert 

Director, Survey Research Laboratory 
University of Illinois 
1005 W. Nevada 
Urbana, Illinois 61801 


Executive Vice President/General Manager 
Sheridan Broadcasting Network 
1745 Jefferson Davis Highway 
Arlington, Virginia 22202 


American Demographics Magazine 
501 Warrent Road 
Ithaca, New York 14850 

FRANCIS, Mildred E. 


Francis and Sobel, Associates 

P.O. Box 5940 

Santa Fe, New Mexico 87502 

FRANK EL, Lester R. 

Executive Vice President 
Audits and Surveys, Inc. 
One Park Avenue 
New York, New York 10016 

GARCIA, L. Manuel 
1111 North Broadway 
Santa Ana, California 92701 

GARCIA, Robert 

U.S. House of Representatives 
Washington, D.C. 20515 


GERSON, EarleJ. 

U.S. Bureau of the Census 

GERTZ, Herbert E. 

Vice President 
Government Services 
Donnelley Marketing 
1301 W. 22nd Street 
Oak Brook, Illinois 60521 

HANEL, Richards. 

Vice President and Manager 
Urban Statistical Division 
R.L. Polk and Company 
431 Howard Street 
Detroit, Michigan 48231 


U.S. Bureau of the Census 


U.S. Bureau of the Census 


7012 Wilson Lane 
Bethesda, Maryland 20034 

HERMAN, Richard 

Election Data Services 
1500 Massachusetts Avenue 
Washington, D.C. 20005 


U.S. Bureau of the Census 


Office of Federal Statistical Policy and Standards 
U.S. Department of Commerce 
2001 S Street, N.W. 
Washington, D.C. 20230 

HILL, Robert 

National Urban League 
Research Department 
733 15th Street, N.W. 
Washington, D.C. 20005 

GOULD, Elizabeth B. 

Managing Editor 
Asian and Pacific Census Forum 
East-West Population Institute 
1777 East-West Road 
Honolulu, Hawaii 96848 


Professor of Economics and Management and Private 

Oakland University and Karl D. Gregory and 

18495 Adrian 
Southfield, Michigan 48075 


Federal Election Commission 
1325 K Street, N.W. 
Washington, D.C. 20463 


Congressional Research Service 
Library of Congress 
Washington, D.C. 20540 

JONES, Charles D. 

U.S. Bureau of the Census 

JORDAN, Clifton 

U.S. Bureau of the Census 



Office of Federal Statistical Policy and Standards 

U.S. Department of Commerce 

2001 S Street, N.W., Room 702 

Washington, D.C. 20230 

GUERNICA, Antonio 

National Association of Spanish Broadcasters 
2550 M Street, N.W. 
Washington, D.C. 20037 

HALL, George E. 

U.S. Bureau of the Census 

JUAREZ, Rumaldo 

Office of Evaluation and Technical Analysis 
Department of Health, Education, and Welfare 
Humphrey Building 
200 Independence Avenue, S.W. 
Washington, D.C. 20201 

KADANE, Joseph B. 

Professor and Head, Department of Statistics 
Carnegie-Mellon University 
Pittsburgh, Pennsylvania 15213 

KAPLAN, David L. 

204 Belton Road 

Silver Spring, Maryland 20901 



U.S. General Accounting Office 
Room 3844, 441 G Street, N.W. 
Washington, D.C. 20548 

KEYFITZ, Nathan 

73 Edgemoore Road 
Belmont, Massachusetts 02178 

MANN, Evelyn S. 

Director, Population Research 

New York City Department of City Planning 

2 Lafayette Street 

New York, New York 10007 

MARKS, Eli S. 

U.S. Bureau of the Census 

KISH, Leslie 


Institute for Social Research 
The University of Michigan 
Ann Arbor, Michigan 48106 

KNAPP, Malcolm M. 


Malcolm M. Knapp, Inc. 

26 East 91st Street 

New York, New York 10028 

KOCH, Gary G. 

Professor of Biostatistics 
University of North Carolina 
Chapel Hill, North Carolina 27514 

KONG, Florence 

California State Office of Economic Opportunity 
555 Capitol Mall 
Sacramento, California 95814 


Research Associate 
Woodstock Center 
Georgetown University 
Washington, D.C. 20057 

MARTIN, Howard N. 


Research Department 

Houston Chamber of Commerce 

P.O. Box 53600 

Houston, Texas 77052 

MARTIN, Margaret E. 

Senior Research Associate 
Committee on National Statistics 
National Research Council 
National Academy of Sciences 
2101 Constitution Avenue, N.W. 
Washington, D.C. 20418 

McARTHUR, Edith 

U.S. Bureau of the Census 

McCULLUM, Donald P. 


Alameda County Superior Court 


Oakland, California 9461 2 

McKENNEY, Nampeo 

U.S. Bureau of the Census 

LEVINE, Daniel B. 

U.S. Bureau of the Census 


Staff Assistant 
Civil Service Subcommittee 
House of Representatives 
Washington, D.C. 20515 

LOVE, Larry 

U.S. Bureau of the Census 

MELNICK, Daniel 

Specialist, Survey Research 
Congressional Research Service 
Library of Congress 
Washington, D.C. 20540 

MILLER, Rita S. 

Research Director 
Borough of Brooklyn 
The City of New York 
Brooklyn Civic Center 
Brooklyn, New York 1 1 201 

MADOW, William G. 

Committee on National Statistics 
National Research Council 
2101 Constitution Avenue, N.W. 
Washington, D.C. 20418 

MILTON, Barbara 

U.S. Bureau of the Census 


U.S. Bureau of the Census 



Graduate School of Business 
University of Pittsburgh 
Pittsburgh, Pennsylvania 15261 

MORRIS, Milton D. 

Director of Research 
Joint Center for Political Studies 
1426 H Street, N.W. 
Washington, D.C. 20005 

MORRIS, Sandra 

U.S. Bureau of the Census 

MOSES, Lincoln E. 


Energy Information Administration 

Department of Energy 

Room 431 1 , Federal Building 

12th & Pennsylvania Avenue, N.W. 

Washington, D.C. 20461 

MOYNI HAN, Daniel P. 

United States Senate 
Washington, D.C. 20510 

NELSON, Bryce 

Los Angeles Times 

1700 Pennsylvania Avenue, N.W. 

Washington, D.C. 20006 

NEWMAN, Dorothy K. 
(Consultant and Lecturer) 
3508 Woodbine Street 
Chevy Chase, Maryland 20015 

O'HARE, William P. 

Deputy Director 

National Social Science & Law Project 
1990 M Street, N.W., Suite 610 
Washington, D.C. 20036 

PASSE L, Jeffrey 

U.S. Bureau of the Census 

PEGUES, Francene 

Office of the Mayor 

City of Detroit 

Detroit, Michigan 48226 


U.S. Bureau of the Census 


New York Times 

1000 Connecticut Avenue, N.W. 

9th Floor 

Washington, D.C. 20036 

ROBEY, Bryant 


American Demographics Magazine 

P.O. Box 68 

Ithaca, New York 14850 

ROBERTS, Harry V. 

Professor of Statistics 
Graduate School of Business 
University of Chicago 
1101 East 58th Street 
Chicago, Illinois 60637 

ROBINSON, J. Gregory 

U.S. Bureau of the Census 


Morning News Department 
Baltimore Sun 
501 North Calvert Street 
Baltimore, Maryland 21203 

SALTER, Douglas C. 


National Planning Data Corporation 

P.O. Box 610 

Ithaca, New York 14850 

SAMORA, Julian 

Department of Sociology 
University of Notre Dame 
Notre Dame, Indiana 46556 


Jackson State University 
Department of Mass Communication 
P.O.Box 18067 
Jackson, Mississippi 39217 

POWERS, Mary G. 


Fordham University 

Bathgate Avenue & Fordham Road 

(Bronx) New York, New York 10458 

SAVAGE,!. Richard 

Department of Statistics 

Yale University 

Box 2179, Yale Station 

New Haven, Connecticut 06520 


SCHEUREN, Frederick J. 

Acting Chief Mathematical Statistician 
Social Security Administration 
1875 Connecticut Avenue, N.E. 
Washington, D.C. 20009 


U.S. Bureau of the Census 

SE I DM AN, David 

Senior Social Scientist 
The Rand Corporation 
2100 M Street, N.W. 
Washington, D.C. 20037 

SI EG EL, JacobS. 

U.S. Bureau of the Census 


Denver Planning Office 
1445 Cleveland Place 
Denver, Colorado 80207 

SLATER, Courtenay M. 

Chief Economist 

U.S. Department of Commerce 

Washington, D.C. 20230 

SMITH, Henry H. 

U.S. Bureau of the Census 

SMITH, Wray 

Technical Director, Office of Evaluation and Technical 

Department of Health, Education, and Welfare 
Humphrey Building 
200 Independence Avenue, S.W. 
Washington, D.C. 20201 

SMOLKA, Richard G. 


School of Government and Public Administration 

American University 

Washington, D.C. 20015 

SNIDER, Patricia 

General Motors Corporation 
3044 West Grand Blvd. 
Detroit, Michigan 48202 

SPAR, Edward J. 


Market Statistics 

633 Third Avenue 

New York, New York 10017 


Assistant Professor of 
Education Statistics and Policy 
School of Education 
Northwestern University 
2003 Sheridan Road 
Evanston, Illinois 60201 

STANTON, Howard R. 

Puerto Rican Migration Research Consortium 

205 Lexington Avenue 

New York, New York 10016 

STEIN, Robert L. 

Assistant Commissioner for Current Employment Analysis 
Bureau of Labor Statistics 
Department of Labor 
441 G Street, N.W. 
Washington, D.C. 20212 

STRAUSS, Robert P. 

Professor of Economics and Public Policy 
Carnegie-Mellon University 
Margaret Morrison Hall 202 
Pittsburgh, Pennsylvania 15213 

TAEUBER, Conrad 


Center for Population Research 
Georgetown University 
Washington, D.C. 20057 

TELLA, Alfred 

U.S. Bureau of the Census 

TIPPS, Havens C. 

Project Director 

U.S. Commission on Civil Rights 
1121 Vermont Avenue, N.W. 
Washington, D.C. 20426 


Department of Statistics 
Harvard University 
One Oxford Street 
Cambridge, Massachusetts 02138 

TURNER, Connie 

Volunteers in Service to America 
806 Connecticut Avenue 
Washington, D.C. 20525 

TURNER, Marshall 

U.S. Bureau of the Census 


UNO, Tad 

D.C. Chapter, Japanese American Citizens' League 
5316 Landgrave Lane 
Springfield, Virginia 22151 


Vice President 
Westat, Inc. 
11600 Nebe I Street 
Rockville, Maryland 20852 

WALLMAN, Katherine 

U.S. Department of Commerce 
14th and Constitution Avenue 
Washington, D.C. 20230 


Vice President 

Director of Research 

SAM I 40-50 

Time & Life Building 

Rockefeller Center 

New York, New York 10020 

WEISS, Michael 

People Magazine 
888 16th Street, N.W. 
Washington, D.C. 200^ j 

WEISZ, Martha 

Special Assistant 

Federal Services Subcommittee 

U.S. Senate 

Room 6206, Dirksen Senate Office Building 

Washington, D.C. 20510 

WETZEL, James 

U.S. Bureau of the Census 

WHITE, Nathaniel 

National Institutes of Health 
Landow Building 
791 Woodmont Avenue 
Bethesda, Maryland 20205 



Joint Center for Political Studies 
1426 H Street, N.W. #539 
Washington, D.C. 20005 


U.S. Bureau of the Census 

WRIGHT, Tommy 

Research Associate 
Union Carbide Corporation 
Nuclear Division 
P.O.Box Y, Bldg 9704-1 
Oak Ridge, Tennessee 37830 

YOUNG, Arthur F. 

U.S. Bureau of the Census 


Department of Health, Education, and Welfare 
Humphrey Building 
200 Independence Avenue, S.W. 
Washington, D.C. 20201 

ZITTER, Meyer 

U.S. Bureau of the Census 


Associate Program Director for Sociology 
National Science Foundation 
1800 G Street, N.W. 
Washington, D.C. 20550 

Appendix C 

Conference Program 


Sheraton National Hotel Arlington Virginia 


6O0-9O0 p.m. Sunday, February 24, 1980 
800-900 a.m. Monday, February 25, 1980 

Conference Overview 

The conference will examine such 
issues as: (1) Methods available for measuring 
and adjusting for the undercount for sub- 
national areas., (2) the feasibility of extending 
the adjustments to population characteristics 
beyond the total population, (3) the use of 
adjusted fiaures in Federal programs, 
(4) the political and legal issues in making 
adjustments to the census count (5) the 
effects of adjustment on equity in the distribu - 
tion of Federal funds. 

The conference is designed to provide 
a forum to consider new approaches to 
measuring the census undercount and to 
assess the implications of adjusting the census 
counts. The conference will (1 ) bri ng together 
recognized experts to present and discuss 
technical papers on the census undercount, 
and (2) permit an exchange of ideas on a 
broad range of undercount issues. 

February 25 


9:00 - 9:45 am. Introductory Remarks 

Vincent P. Barabba Robert Garcia 

9. 45 - 1 0:15 a.m. The Census Bureau Experience and Plans 

Jacob S. Siegel Charles D, Jones 

10.15 -10:30 a.m. COFFEE BREAK 

1 0:30 - 1 2:30 p.m. Adjustment Pro and Con 

Topic-Facing the Fact of Census Incompleteness - Nathan Keyfitz 
Topic-Adjusting for the Decennial Census Undercount 

An Environmental Impact Statement-Peter K. Francese 
Discussant-Robert P. Strauss 
Floor Discussion 

12:30-200 p.m. LUNCHEON PROGRAM 

Topic-The Congressional Experience - Daniel P. Moynihan 

2.00- 3.45 p.m. 

Methodological Considerations 

Topic-Can Regression be Used to 

Estimate Local Undercount 

Adjustments? - Eugene P. Ericksen 
Topic-Modifying Census Counts - 

Richard Savage 
Discussant-William G. Madow 
Floor Discussion 

3.45 p.m. - 4 :00 p.m. COFFEE BREAK 

4:OOp.m. -5.30p.m. 
Methodological Considerations* 

Topic-Diverse Adjustments for 

Missing Data- Leslie Kish 
Topic-Some Empirical Bayes Approaches 

to Estimating the 1 980 Census 


Discussant-Tommy Wright 
Floor Discussion 


Impact of Adjusting 

Topic-Federal Program Impacts of 
1980 Census Undercoverage 
Adjustment- Courtenay M. Slater 

Topic-Issues on the Impact of Census 
Undercounts on State and Local 
Government Planning 
HerringtonJ. Bryce 

Di scu ssa nt-Wray Sm ith F loor Discussion 

600 p.m. 



The International Experience 

Topic-The Australian Experience- 
Brian Doyle 

Topic-Summary of Other 

International Experience - 
Meyer Zitter 

Floor Discussion 

*Late addition to program — 
A. P. Dempster and T. J. Tomberlin 


February 26 


9:00 - 1 2:00 Noon Other Methods and Impacts 

Topic-The Synthetic Method. Its Feasibility tor 
Deriving the Census Undercounttor 
States and Local Areas- RobertB. Hill 

Discussant- Joseph Waksberg 

Topic-The Impact ot an Adjustmentto the 
1 980 Census on Congressional and 
Legislative Reapportionment -Carl P. Carluco 

Discussant- Ben J. Wattenberg 
10=15-10:30a.m. COFFEE BREAK 


Topic-Legal and Constitutional Constraints on 

Census Undercount Adjustment- Donald P. McCullum 

1 2:00 - 1 :30 p.m. LUNCHEON 


1 :30 - 3.00 p.m. Equity Considerations 

Topic-Should the Census Count be Adjusted 
for Allocation Purposes?-Equity 
Considerations-Ivan P. Fellegi 

Topic-Implications ot Equity and Accuracy 
for Undercount Adjustment 
A Decision -Theoretic Approach - Bruce Spencer 

Discussant-Harry V. Roberts 

3:00- 3:1 5 p.m. COFFEE BREAK 


3=1 5 - 5:00 p.m. Recap and Concluding Discussion 


Conf. on Census undercount. Census 

U.S. Department 
of Commerce 
Washington, D.C. 20233 

Official Business 

Penalty for Private Use, $300 

I Will 


CB/Bureau of the Census Library 

5 0673 01020895 
PERMIT No. G-58